OPTIMIZATION AND ANALYTICS FOR AIR TRAFFIC MANAGEMENT

ADISSERTATION SUBMITTED TO THE DEPARTMENT OF MANAGEMENT SCIENCE AND ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

Michael Jacob Bloem May 2015

© 2015 by Michael Jacob Bloem. All Rights Reserved. Re-distributed by Stanford University under license with the author.

This work is licensed under a Creative Commons Attribution- Noncommercial 3.0 United States License. http://creativecommons.org/licenses/by-nc/3.0/us/

This dissertation is online at: http://purl.stanford.edu/jh561fd9930

ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Nicholas Bambos, Primary Adviser

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Juan Alonso

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Yinyu Ye

Approved for the Stanford University Committee on Graduate Studies. Patricia J. Gumport, Vice Provost for Graduate Education

This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file in University Archives.

iii Abstract

The air traffic management system is important to the United States’economyand way of life. Furthermore, it is complex and largely controlled by human decision mak- ers. We studied and learned from these expert decision makers to facilitate the transi- tion to an increasingly autonomous air traffic management system that leverages the strengths of both computer systems and humans to provide greater value to stake- holders. In particular, we constructed decision models and corresponding solution algorithms that enable decision-support tool development. Our approach to building the decision models and algorithms leveraged expert input and feedback, operational decision data analytics, fast-time simulations, and human-in-the-loop simulations. We utilized and extended techniques from optimization, dynamic programming, and machine learning both for developing solution algorithms and for making inferences about decisions based on operational data. In this dissertation we discuss our research on three types of decisions in the air traffic management system. The first is faced by supervisors of air traffic controllers: how to configure available airspace, controllers, and other resources to ensure safe and efficient operations in a region of airspace over a period of time. We describe a prescriptive decision model and solution algorithm that returns multiple good and distinct configuration schedule advisories. The second type of decision is faced by airlines: how to assign a set of flights to a set of slots in an Airspace FlowProgram.We developed a novel heuristic that finds delay cost model noise parameters to maximize an approximation of the likelihood of operational data describing how airlines have done this historically. We thereby provide new insights into models of the cost of delay to an airline, which are fundamental to air traffic flow management research aimed at

iv decision-support tool development. The third type of decision is faced by air traffic flow managers: when to implement a Ground Delay Program. We use operational data to build two types of model of the implementation of Ground DelayPrograms. The descriptive models we developed can be used to predict Ground Delay Program implementation, which may be of value in decision-support tools for stakeholders such as airlines. They also provide insights into current practice that could motivate the development of tools to support the traffic flow managers who decide when to implement Ground Delay Programs.

v Acknowledgements

In 1 Corinthians 4:7, the apostle Paul asks rhetorically “What do you have that you did not receive?” The answer of course is “nothing,” and so I have much and many to acknowledge. First and foremost, I am grateful to God, who through the life, death, and resurrection of Jesus Christ has graciously rescued me in the most important way and given my life meaning (Ephesians 2:8–10). My work too has profound purpose because of him (Genesis 1:26–28, 1 Corinthians 10:31, and Colossians 3:23–24), and while work can be discouraging (especially research), he has often enabled me to rejoice in the “toil” he has given me (Ecclesiastes 5:18–19). God has used many people to facilitate the work recorded here. Prof. Nick Bambos took a chance when he agreed to be my advisor. I have been an unconventional part- time PhD student and Nick had not previously studied air traffic management. In spite of the risks and challenges associated with advising me, Nick agreed to do so. I’m grateful for his sensitivity to my situation as an employee at NASA,andmore recently also as a father. The enthusiasm and curiosity he brings to our research has been invigorating. It has been a pleasure to learn from him these pastfewyears. My friend and former colleague Tansu Alpcan introduced me to Nick. This was not the first time that Tansu provided me with significant assistance in my academic and professional endeavors. I am grateful to other faculty members and students at Stanford for their instruc- tion, support, and encouragement. The sequence of courses on convex optimization taught by Prof. Steven Boyd were an incredible start to my training at Stanford, and I’m thankful to Eric Meuller and Stephen Schuet for helping me survive these. Prof. Ben Van Roy’s course on approximate dynamic programming, Prof. Andrew

vi Ng’s course on machine learning, and Prof. Ramesh Johari’s course ongametheory were particularly formative. My collaboration with David Hattaway on our machine learning course project injected much-needed sophistication into how I handle data and write software. I doubt I would have passed the qualifying examination without the instruction and accountability provided by Jing Ma and Diana Negoescu. Cur- rent and former students in my research group, including Praveen Bammannavar, Jeff Mounzer, Lawrence Chow, Martin Valdez-Vivas, Neal Master, Kevin Schubert, and Zhengyuan Zhou, have provided helpful feedback, encouragement, and inspiration during my PhD studies. More recently, I’ve benefited from collaborations with Jon Cox and Prof. Mykel Kochenderfer in the Department of Aeronautics and Astronau- tics. Profs. Yinyu Ye and Juan Alonso have graciously provided helpful feedback on my research as part of my PhD reading committee. Many of my colleagues at NASA have been encouraging and flexible as I’ve pursued my PhD studies. The unwavering support of my supervisors over thecourseofseven years of study has made this research possible. These supervisors include Dr. Robert Windhorst, William Chan, Dr. JeffSchoeder, Tom Davis, Kathy Lee, andSandy Lozito. NASA Ames Research Center Director of Aeronautics Dr. Tom Edwards even agreed to be part of my oral defense committee, and provided fantastic feedback and guidance in that capacity. I’m especially thankful to my colleague Dr. Banavar Sridhar for initiating the process that led to my employment at NASA, as well asforyears of mentoring. Collaborators and project leaders at NASA have provided essential technical guidance. These include Dr. Heather Arneson, Haiyun Huang, Dr. Karl Bilimoria, Dr. Michael Drew, Chok Fung (Jack) Lai, Greg Wong, Dr. Pramod Gupta, Dr. Avijit Mukherjee, Dr. Kapil Sheth, Dr. Shon Grabbe, Dr. Paul Lee, Dr. Laurel Stell, and Dr. Deepak Kulkarni. I’m also grateful to Robie Remple for his meticulous review of this dissertation, which led to many corrections and improvements in the grammar and style of the document. I received valuable guidance and feedback on this research from several of the experts who make the air traffic management system work, or who did so for many years before they retired. Mark Evans and Brian Holguin answered many questions about airspace, operating position, and workstation configurations. I would also like

vii to thank several individuals at Cleveland Air Route Traffic Control Center for pro- viding valuable input and feedback regarding my research. These individuals include but are not limited to Mark Madden, Brian Hanlon, Kevin Shelar, Al Mahilo, Tom Roherty, Connie Atlagovich, Bill Hikade, Steve Herbruck, Martin Mielke, Don Lam- oreaux, Rick Buentello, Dale Juhl, Todd Wargo, Mike Klupenger, Mark McCurdy, and Stephen Hughes. Raphell Taylor, Miguel Anaya, Wayne Bridges, and Bill Pre- ston, all current or former employees of Oakland Air Route Traffic Control Center, also provided useful input and feedback. David Hattaway put me in touch with Cindy Hood, Supervisory Traffic Management Coordinator at New York TRACON, who I thank for valuable insights into current Ground Delay Program decision making. Michael Brennan provided me with helpful information about airline slot utilization and related data. I would certainly not arrived at this point in my studies without the support, love, and prayers of my family and friends. My wife Sarah has been a constant reminder and demonstration of God’s permanent love for me, which thankfully does not depend on my academic accomplishments or good works in general. Although I too often take her for granted, I am grateful for her patient support through the ups and downs of graduate studies. I’m also grateful for the joyful gift of our children Andrew and Joanna (and baby #3 due in June 2015); they have provided a delightful and challenging purpose outside of research. My parents have invested tremendous toil and treasure in me and my education, and I’m confident that God has worked through their faithful prayers. Many friends and neighbors have supported and cared for me and my family these past few years. I’ll restrain myself to only listing the names of those who provided advice on academic matters: Dr. Tony Evans, Robbie Bunge, Dr. Paul Varkey Parayil, and Dr. Ryder Winck. Finally, I am grateful to God for providing the circumstances under which I have pursued my PhD. Generous financial support for my studies came from the training budget of the Aeronautics Directorate at NASA Ames Research Center. The Honors Cooperative Program, offered through the Stanford Center for Professional Devel- opment, provided a mechanism for me to complete a PhD as a part-timestudent

viii and full-time employee at NASA. The Department of Management Science and Engi- neering was gracious enough to accept me as part of this program, even though this arrangement is not typical. When pondering these circumstances, I often conclude that they were too good to be true.

ix Contents

Abstract iv

Acknowledgements vi

1Introduction 1 1.1 Air TrafficManagement ...... 1 1.1.1 Importance ...... 1 1.1.2 Complexity ...... 2 1.1.3 Pervasive Control by Human Decision Makers ...... 4 1.2 ResearchObjectiveandApproach ...... 6 1.3 DissertationOverview ...... 10

2Decision-SupportToolforAreaSupervisors 13 2.1 Introduction...... 13 2.2 Background ...... 16 2.2.1 Area of Specialization Configurations ...... 16 2.3 DecisionModel ...... 19 2.3.1 Configuration Schedule Advisory Problem ...... 19 2.4 Solution Algorithm for CSA ...... 29 2.5 DefaultDecisionModelParameters ...... 29 2.5.1 StaticandReconfigurationCostParameters ...... 30 2.5.2 Reconfiguration Weight ...... 34 2.6 ReconfigurationWeightParametricStudy ...... 36 2.7 Multiple Advisories Optimization Problem ...... 41

x 2.7.1 Mε-Optimal d-Distinct Configuration Schedule Advisories Prob- lem...... 41 2.7.2 Motivation ...... 43 2.8 Solution Algorithms for M-ε-d-CSAs ...... 48 2.8.1 Value Iteration Fraction Optimal with Exhaustive Advisory Search...... 48 2.8.2 Forward and Backward Value Iteration with Sequential Advi- sorySearch ...... 50 2.8.3 Sequential Distinct A∗ Algorithms ...... 52 2.8.4 Lowest-CostPaths ...... 54 2.8.5 Computational Complexity Comparison ...... 55 2.9 Fast-Time Simulations of Algorithms for M-ε-d-CSAs ...... 57 2.9.1 Investigation of the Performance of Algorithms on Small Prob- lemInstances ...... 58 2.9.2 Investigation of the Performance of Algorithms Using a Year of Data...... 60 2.10 Human-in-the-LoopExperimentResults ...... 64 2.11Conclusions ...... 66

3AirlineDelayCostModelEvaluation 69 3.1 Introduction...... 69 3.2 Background ...... 71 3.3 Method ...... 71 3.3.1 Airline Decision Model ...... 71 3.3.2 Estimation Problem Statement ...... 74 3.3.3 Linear Program Cost Approximate Maximum Likelihood Esti- mation...... 74 3.3.4 Simulation Approximate Likelihood Estimation ...... 77 3.4 Validation ...... 78 3.4.1 Implementation Notes ...... 78 3.4.2 Comparison of Heuristics ...... 79

xi 3.4.3 LPCAMLE Validation with Synthetic Matching Data . . . . . 80 3.5 Evaluating Airline Delay Cost Models ...... 82 3.5.1 Data...... 86 3.5.2 Results...... 89 3.6 Conclusions ...... 89

4GroundDelayProgramImplementationModels 91 4.1 Introduction...... 91 4.2 Data...... 94 4.2.1 WeatherObservations ...... 96 4.2.2 WeatherForecast ...... 96 4.2.3 Number of Scheduled Arrivals ...... 97 4.2.4 CurrentAirportState ...... 97 4.2.5 Predictions of Future Airport Arrival Rates ...... 97 4.2.6 Reroutes...... 98 4.2.7 Previous GDP Plan ...... 98 4.2.8 Ground and Air Buffers ...... 99 4.3 GDPModels ...... 100 4.3.1 GDP Implemented Models ...... 101 4.3.2 GDPParametersModel ...... 111 4.4 Experiments...... 112 4.4.1 Parametric Studies Related to Look-ahead ...... 112 4.4.2 Prediction Quality Results ...... 117 4.4.3 Insight Results ...... 124 4.5 Conclusions ...... 130

5ConclusionsandFutureWork 132 5.1 Contributions ...... 132 5.2 FutureWork...... 134

AForwardA∗ Algorithm 136

xii BTheNP-CompletenessofM-ε-d-CSAs 137

CReverseValueIterationAlgorithm 140

DRecursiveValueIterationFractionOptimalAlgorithm 141

EPropertiesofFBVISAS 142 E.1 Simple Reconfiguration Cost-Dominated Problem Instances ...... 142 E.2 Simple Static Cost-Dominated Problem Instances with d>1..... 145

FForwardDistinctA∗ Algorithm 147

GPropertiesofFDA∗ 149

HForwardDistinctA∗ with Shortcuts Algorithm 155

Bibliography 157

xiii List of Tables

2.1 StaticCostParameters...... 31 2.2 ReconfigurationCostParameters ...... 32 2.3 Computational Complexity of Algorithms ...... 56 2.4 Fraction of Problem Instances with One, Two, or Three Advisories Returned...... 61

3.1 Comparison of Heuristics on Sample Problem Instances (σ2 =25) . . 79 3.2 Candidate Cost Models with Largest Approximate Log-Likelihood for AirlineE...... 81 3.3 Candidate Cost Models with Largest Approximate Log-Likelihood for AirlineG ...... 81 3.4 DelayCostModels ...... 84 3.5 Cost Models with Largest Approximate Log-Likelihood ...... 89

4.1 Confusion Matrices for EWR BC GDP Implemented Model ...... 119 4.2 Confusion Matrices for SFO BC GDP Implemented Model ...... 121 4.3 Confusion Matrices for EWR IRL GDP Implemented Model . . . . . 122 4.4 Confusion Matrices for SFO IRL GDP Implemented Model ...... 123

4.5 Properties of Parameters of RˆC forEWRandSFO ...... 129

xiv List of Figures

1.1 Human decision makers involved in air traffic management...... 4 1.2 Enabling decision support tool development with decision modeling and solution algorithms...... 7 1.3 Approach to modeling decisions and developing algorithms...... 9

2.1 SectorsandsampleconfigurationofZOBarea4...... 18 2.2 Traffictimestepstaticcostforanopensector...... 24 2.3 Portion of the graph for a sample CSA problem instance...... 28 2.4 Cost trade-offcurvefor1May2012...... 36 2.5 Summed cost ratio error versus βR...... 37 2.6 Distribution of open sector–minutes versus βR...... 39 2.7 Cumulative distributions of open sector instance durations...... 40 2.8 One dimension of the space of problems involving finding M paths. . 45 2.9 Excess cost fraction achieved by second advisories...... 60 2.10 Distribution of the ratio of costs...... 63 2.11 Distributions of computation time required by the SDA∗-SC and FBVISAS heuristics...... 64 2.12 Screenshot of the OASIS decision-support tool...... 65

3.1 Normalized standard deviation estimates for synthetic data generated with four normalized actual standard deviation values...... 83 3.2 Histogram of the number of matching messages for each airline in the dataset...... 87

xv 3.3 Histograms of the number of flights and slots for the matchings oftwo airlines...... 88

4.1 Ground and air buffersystemmodel...... 100 4.2 StructureoftheGDPmodel...... 101 4.3 Structure of the BC GDP Implemented model...... 103 4.4 StructureoftheCSIalgorithm...... 105 4.5 Prediction quality of BC GDP Implemented model as look-ahead time horizonchanges...... 115 4.6 Fit of reward regressor as γ changes...... 116 4.7 Features with highest importance scores for the EWR BC GDP Imple- mentedmodel...... 125 4.8 Features with highest importance scores for the SFO BC GDP Imple- mentedmodel...... 126

xvi Chapter 1

Introduction

1.1 Air Traffic Management

Air traffic management (ATM) refers to the system of systems that enables safe and efficient aircraft flights. ATM includes air traffic control (ATC) and air traffic flow management (TFM). ATC involves monitoring and controlling aircraft to ensure that they stay safely separated and can operate efficiently. It typically seeks to identify and resolve issues that could arise in the next twenty minutes. TFM is more strategic: it delays flights or adjusts their trajectories as much as several hours in advance so that ATC is not presented with a set of flights that are difficult to efficiently keep separated. In other words, TFM attempts to avoid exceeding airspace and airport capacity levels that are set to ensure safe ATC. TFM also seeks to utilize available capacity fairly and to give flight operators the flexibility to behave in accordance with their preferences or business models.

1.1.1 Importance

ATM enables civil aviation, which is an important part of the United States (US) economy and an enabler of our way of life. Civil aviation includes airline operations, transportation of cargo by air, general aviation, and aircraft manufacturing. Eco- nomic activity related to civil aviation amounted to $1.5 trillion in 2012, which was

1 CHAPTER 1. INTRODUCTION 2

5.4% of the US gross domestic product [8]. Civil aviation also supported 11.8 million jobs in 2012 [8]. The positive trade balance for civil aircraft manufacturing was $54.3 billion in 2012, making it the top net exporting industry in the US [8]. Many peo- ple take many trips on airlines that can only operate because of ATM: 837.2 million passengers were carried by airlines operating in US airspace in 2012 [8]. The value of these trips is difficult to quantify, especially for personal trips. However, attempts have been made to quantify the value of business travel, which often involves travel by air. Econometric analysis and surveys of executives suggest that “for every dollar invested in business travel, companies realize $12.50 in incremental revenue” [2]. Even those who never fly can benefit from the transportation of cargo by air. The $61.2 billion revenue ton-miles of freight that passed through US airports in 2012 provide one quantification of this benefit [8]. The average of 21,000 tons of cargo carried per day by US airlines provide another [49]. Furthermore, the importance of this system is expected to increase. Boeing pre- dicts that from 2013 to 2033, airline passenger traffic in the US (measured by revenue passenger kilometers) will grow at 2.9% annually, cargo traffic (revenue ton kilome- ters) will grow at 3.4%, both of which exceed the predicted annual gross domestic product growth rate (2.5%) [7]. Airbus expects even faster annual airline passenger growth (3.0%) in North America over a similar time period [4]. According toFederal Aviation Administration (FAA) predictions, the number of enplaned passengers on US commercial air carriers will grow at 2.2% annually from 2014 to 2034 to a total of 1.15 billion in 2034 [9]. By 2050, researchers have estimated that the average Ameri- can’s mobility (distance traveled per year) will increase by a factor of 2.6 over what it was in 1990, with the bulk of that travel distance (71%) being covered by high-speed options such as air travel [112].

1.1.2 Complexity

ATM is a complex system of systems. In the US, more than 50,000 commercial flights operate on some days, and there can be more than 5,000 flights operating at any given moment [116]. The complexity of the system is magnified because it is CHAPTER 1. INTRODUCTION 3

operating near its capacity. Between 2005 and 2014, 17–24% of flights experienced arrival delays, and 1.3–2.2% were canceled [37]. These numbers are larger for the most congested airports.For example, 24–36% of flights scheduled to arrive at Newark Liberty International Airport were delayed and 2.5–4.7% were canceled [37]. Delays at these congested airports can ripple throughout the system [49]. Weather is the primary cause of delay, but congested airspace near airports contributes to about 20% of these flight delays [49]. Additional complexity arises because the system involves multiple stakeholders whose objectives do not always align; these include passengers, airlines, airports, general aviation pilots, and the FAA. Multiple sources of uncertainty also lead to complexity [103, 109]. There are several reasons to expect the complexity of the US ATM system to increase in coming decades. Although the number of operations at FAA and contract towers has decreased since 2005, the FAA predicts that these will grow at 1.0% per year overall from 2014–2034 and at 1.7% at large hub airports, some of which are already operating at or near capacity [9]. The FAA also anticipates 1.7%annual growth in the number of aircraft handled by en-route ATC centers over this period. Airbus reports that there were 4,104 passenger aircraft with at least 100 seats in service at the beginning of 2013, but that number will grow to 6,394 by 2032 (2.2% annual growth) [4]. Boeing predicts 1.6% annual growth in the size of airline fleets in North America, and that there will be a total of 9,120 aircraft in these airline fleets by 2033 (up from 6,650 in 2013) [7]. The variety in aircraft (suchasunmanned aircraft systems and personal air vehicles) and mission types (such as commercial space launches and reentries) is likely to increase, and these will require new pro- cedures [9, 44, 49, 116]. For example, the number of unmanned aircraft systems in operation could reach 250,000 by 2035 [49]. Cybersecurity issues may impact the system [75]. Climate change may cause rising sea levels and more extreme weather events, which would create new challenges and additional complexity [36,38,49]. For example, intense storms may become more common, and resulting storm surges could impact low-lying airports. Some of the most important and busiest airports in the US, such as San Francisco International and the three main airports serving the New York City area, are vulnerable to such storm surges [49]. Furthermore, aviation makes CHAPTER 1. INTRODUCTION 4

a non-trivial impact on the climate: it is responsible for 13% of transportation-related and 2% of total global carbon dioxide emissions [3]. As aviation seeks to reduce its contribution to climate change, yet another objective will need to be considered by decision makers. Slim profit margins for airlines and budgetary issues facing the FAA suggest that pressures for both entities to reduce costs will not diminish even while they operate in or manage this increasingly complex system [49, 55, 74].

1.1.3 Pervasive Control by Human Decision Makers

The operation of the ATM system depends on interactions between many decisions made by many humans. In some complex systems, such as the internet, humans use devices that make requests of the system but the control of the system is largely automated, except perhaps at strategic time horizons. In the ATMsystem,onthe other hand, humans make and execute decisions that meaningfully control the system at both tactical (ATC) and strategic (TFM) time horizons.

demand for resources & services

separation assurance & flow management

pilots (1000s) controllers (1000s) dispatchers (100s) supervisors (100s) operations managers (10s) flow managers (10s)

Figure 1.1: Human decision makers involved in air traffic management.

Figure 1.1 depicts interactions between decision makers at airlines andtheFAA and includes estimates of how many of each type of decision maker might be operating the system at a busy time. Flight operators, such as airlines, request services, such as separation from other aircraft, and also the use of ATM resources, such as runways and CHAPTER 1. INTRODUCTION 5

airspace. At tactical time frames, these requests are made by thousands of pilots who communicate directly with controllers and then select and implement control actions to operate the aircraft. Hundreds of dispatchers, each managing a few flights more strategically than pilots, also make decisions that impact these requests. For example, they determine which flight plan to file for each flight. Finally, tens of operations managers at airline operation centers make strategic decisions about dozens of flights. For example, they help determine which flights will use which of the slotsthatare allocated to an airline in a Ground Delay Program or Airspace Flow Program [15,111]. As it executes ATC and TFM, the FAA provides these services and manages demand for these resources. Thousands of controllers interact with pilots to ensure separation at tactical time frames of tens of minutes. Hundreds of supervisors of controllers determine how to configure resources, such as controllers and airspace, over time horizons of an hour or more to ensure that flights can operate safely and efficiently in a region of airspace. Finally, tens of flow managers working at facilities such as the Air Traffic Control System Command Center make strategic decisions that affect flows of flights over time horizons extending to six hours in the future. For example, they determine when to implement a Ground Delay Program and select the program parameters. Just like the important tactical decisions that ensure separation and therefore preserve life and property, strategic decisions have meaningful effects on the value that stakeholders derive from the ATM system. Ground Delay Programs can lead to thousands of total minutes of ground delay that must be absorbed by hundreds of flights. The cost of these delays are nontrivial for airlines. Air transportation delays in the US were estimated to have cost airlines $8.2 billion in 2007, a year in which their total profits were just $5 billion [14,18]. For decades, the FAA and airlines have worked together to identify collaborative processes and supporting technologies that will help ensure good strategic control of the ATM system, which is a testimony to the importance of these TFM decisions [15]. CHAPTER 1. INTRODUCTION 6

1.2 Research Objective and Approach

Given that the ATM system is an important system that is largely controlled by human decision makers facing complex decisions, we seek to improve those decisions. The objective of the research presented in this dissertation is to enable the develop- ment of decision-support tools for the decision makers that control the ATM system. Such tools could take advantage of computing power and real-time data to provide relevant predictions of future outcomes, values for metrics that guide decisions, or even suggested decisions. These tools are a first step towards an ATM system that is better at leveraging the strengths of both automation and humans to provide value to stakeholders. We speculate that such a system will be increasingly autonomous, with computer-based systems contributing more and more to decision making and even learning from mistakes and successes as they do so, but with humans contin- uing to play an essential supervisory role [6]. There are many challenges involved with increasing the autonomy of the ATM system [6]. Our research seeks to help overcome these challenges largely by learning from the expert decision makers that currently control this system, either by directly interacting with them or by studying data describing their decisions. Figure 1.2 depicts our objective. Much goes into decision support tool develop- ment that is not covered in this dissertation. For example, we do not do any research related to user interface design. Therefore, the decision-support tool box is left white in Fig. 1.2. We seek to contribute to decision support tool development by 1) using a decision theoretic framework to model relevant decisions, and then 2) developing algorithms to solve those models. To model decisions in this framework, we specify constraints and an objective to be maximized or minimized [84]. The ATM system is dynamic, so the constraints may include system dynamics (e.g., how current decisions impact which situations will be encountered in the future). Relevant uncertainties could also be included in these decision models, but we did not pursue this aspect of decision modeling in this dissertation [84]. Each such decision model specifies an optimization problem, and our second contribution is a set of algorithms that attempt to solve these problems. CHAPTER 1. INTRODUCTION 7

decision- support tool

Figure 1.2: Enabling decision support tool development with decision modeling and solution algorithms.

There are at least three ways that these models and solution algorithms can en- able decision-support tool development. The path directly from decision model to solution algorithm to decision-support tool in Fig. 1.2 is the most straightforward. If a decision model that prescribes how decisions should be made can be formulated such that a solution algorithm can find resulting decisions for various situations, then these decisions can be suggested in a decision-support tool. We enabled a decision- support tool for controller supervisors by following this path (see Chapter 2). Decision models that are descriptive rather than prescriptive open up other means of enabling decision-support tools. Just the form of the decision model, particularly the objective, can produce insights that guide researchers as they seek to develop tools, as depicted in the right-most path to a decision-support tool in Fig. 1.2. For example, a tool can be as simple as a dashboard that displays metrics related to decision objectives. The general form of objectives can also help researchers seeking to design mechanisms for allocating ATM resources. Chapter 3 describes how we produced descriptive decision CHAPTER 1. INTRODUCTION 8

models of airline utilization of slots in Airspace Flow Programs. These models, espe- cially the inferred decision maker objectives, can produce insights that guide further TFM research and decision-support tool development. Finally, descriptive decision models and corresponding solution algorithms can produce predictions of decisions, as depicted by the left-most path to a decision-support tool in Fig. 1.2. Predictions of de- cisions can be valuable in decision-support tools because many outcomes in the ATM system are the result of decisions made by many participants. Therefore, accurate predictions of the decisions of others can help participants make decisions that lead to desired outcomes. We followed this path when we developed a descriptive model and solution algorithm as a way to predict Ground Delay Program implementation (see Chapter 4). Several sources of information and feedback loops can be used when developing decision models and solution algorithms. Figure 1.3 depicts sources of information and feedback loops that made up our approach, particularly for the development of a prescriptive model. The various sources of information are color-coded to de- pict how easy they are to access when working on the ATM system (green=easy, yellow=moderately difficult, red=difficult). Expert input is the preferred starting point when developing decision models. This input can take the form of observations of decision making or discussions with decision makers. More quantitative input can be derived through surveys. While there are thousands of individuals making decisions in the ATM system,theyarenot always easy to access. Furthermore, it can be difficult to elicit input from experts that leads to a precise specification of relevant decisions. A more readily-accessible source of information is operational decision data. There are certainly challenges associated with leveraging this data to infer constraints and especially an objective—doing so involves solving a challenging type of problem known as an inverse problem. The data are often plentiful and accessible, however, so finding data analytics techniques for these inverse problems can be worthwhile. Once a decision model and solution algorithm have been derived, expert decision makers can provide feedback on them directly. As experts may not be familiar with such models and algorithms, it may be easier to collect expert feedback on the results of fast-time simulations, which are CHAPTER 1. INTRODUCTION 9

operational expert decision input data

decision model expert constraints feedback objective

solution fast-time algorithm simulations

decision- human-in- support the-loop tool simulations

Figure 1.3: Approach to modeling decisions and developing algorithms. generally easy to produce. Such fast-time simulations can also be used to directly pro- vide feedback on decision models. For example, parametric studies can be carried out with fast-time simulations to help select appropriate parameter values. Furthermore, when multiple solution algorithms are under consideration, their relative performance in fast-time simulations can guide the selection of the best one for use in a decision- support tool. Finally, human-in-the-loop simulations, and expert feedback on their results, are a potent mechanism for evaluating and improving decision models and solution algorithms. In some domains it is easy to execute a human-in-the-loop ex- periment. For example, website design alternatives are regularly evaluated with data collected from site visitors who are randomly presented with one design or another. CHAPTER 1. INTRODUCTION 10

When studying ATM, it is typically difficult and expensive to set up and execute a human-in-the-loop experiment in a laboratory, let alone in the operational system. Therefore, we see leveraging other sources of information, such as operational de- cision data and fast-time simulations, as particularly important when enabling and developing decision-support tools for ATM.

1.3 Dissertation Overview

Our research on enabling the development of decision-support tools for the ATM system is organized into three chapters. Each chapter describes a different ATM de- cision, how we modeled it, and how we developed corresponding solution algorithms. In addition, each chapter provides results and discussion that enable decision-support tool development via one or more of the paths depicted in Fig. 1.2. In Chapter 2 we describe the development of a prescriptive decision model and solution algorithm for a resource allocation decision faced by controller supervisors. We worked closely with subject-matter experts to develop this decision model, and we used both expert feedback and operational decision data to determine appropriate model parameters. Certain components of the decision were not included in the model due to challenges associated with quantifying characteristics of individual controllers, and the objective was also difficult to specify precisely, so experts suggested that the solution algorithm return a set of good and distinct solutions rather than just one. The decision version of the optimization problem resulting from this suggestion is in the set of nondeterministic polynomial time complete problems (known as NP- complete problems), so we did not expect to find efficient algorithms to solve the optimization problem exactly. This led us to develop and evaluate novel heuristic solution algorithms. One of these was implemented in a decision-support tool, and the algorithm performed well in a human-in-the-loop experiment. The research covered in this chapter was originally published in a sequence of papers and articles [23, 24, 27–29]. While several people were involved in this line of research, the author of this dissertation made major contributions to the research described in and the writing of these papers and articles, and is the first author on each one. CHAPTER 1. INTRODUCTION 11

The cost of delay to an airline can vary from flight to flight based on a variety of factors. A model of this cost is essential to TFM research aimed at new decision- support tools. Therefore, researchers have proposed a variety of models of this cost of delay. In Chapter 3 we describe how we used data describing airline decisions in Airspace Flow Programs to evaluate these cost models. To do so, we developed a novel data analytics technique that finds cost noise parameters to maximize an approximation of the likelihood of the airline decisions. This enabled us tofindthe cost models that achieve the largest approximate likelihood, as well as corresponding cost noise parameter estimates. This line of research was initially presented in [32] and [22]1. Again, the author of this dissertation made major contributions tothe research described in and the writing of these papers, and is the first author on both. The last type of decision, described in Chapter 4, is whether or not to implement a Ground Delay Program. Ground Delay Programs are one of the most important and impactful TFM tools. We built models to predict hourly Ground DelayProgram implementation; these predictions may be helpful in tools for impactedsystemusers. Another objective for developing these models was to gain insight intohowandwhy these implementation decisions are made, which may be of interest to those devel- oping tools to assist in Ground Delay Program decision making. More specifically, we used historical data to develop random forest behavioral cloning models and in- verse reinforcement learning models. After quantifying the ability of these models to predict Ground Delay Program initialization and cancellation, we studied their structure to gain insights into issues such as the degree to which Ground Delay Pro- gram decisions depend on conditions now or conditions anticipated in the next few hours. This material was initially published in [25] and [26]. Again, the author of this dissertation made major contributions to the research presented in and the writing of these publications, and is the first author on both.

1In reference to IEEE copyrighted material which is used with permission in this dissertation, the IEEE does not endorse any of Stanford University’s products or services. Internal or personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for advertising or promotional purposes or for creating new collective works for resale or redis- tribution, please go to http://www.ieee.org/publications_standards/publications/rights/ rights_link.html to learn how to obtain a License from RightsLink. CHAPTER 1. INTRODUCTION 12

Finally, concluding thoughts and ideas for future research are described in Chap- ter 5. We have demonstrated the value of our approach to improvingtheATMsystem, but there are many other ATM decisions to be studied. Furthermore, the impact of our work on the ATM system will ultimately be determined by whether or not it en- ables improved decision-support tools or other increasingly autonomous systems that are actually deployed. Developing and deploying such tools and systems will involve much additional research and analysis. Chapter 2

Decision-Support Tool for Area Supervisors

2.1 Introduction

In current air traffic management operations, a set of resources that make up an Area of Specialization (or just area) is configured by an area supervisor so that air traffic can operate safely and efficiently [57]. An area configuration specifies how airspace, air traffic controller personnel, and physical air traffic control equipment is utilized to control air traffic. An operating position is an air traffic control role to be filled by a controller monitoring a volume of airspace called an open sector;each open sector is allocated between one and three operating positions and assigned to a workstation. Some configurations allocate these resources in a way that facilitates safe and efficient operations. For example, a safe and efficient configuration would not ask a single controller to control too many aircraft at once, as this might require the controller to execute too many tasks in a short period of time. Neither would it ask a controller to control just a couple of aircraft at once, as this might make it difficult for the controller to remain engaged and attentive. Configurations are changed multiple times each day, but such changes require that additional tasks be performed by controller personnel, which may degrade the safety and efficiency of traffic operations for a period of time near the change [68]. For routine operations,

13 CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 14

the selection of area configurations may not be particularly challenging because traffic and resource availability patterns do not change much from day to day. However, the selection of configurations may become more difficult in off-nominal conditions, such as when traffic flows change in response to weather, when some equipment fails, or when there is a shortage of controllers available to fill operating positions [87]. A decision-support tool might improve area configuration decisions, especially during these challenging situations. As a step towards such a tool, several algorithms that could support tactical area configuration decision making have been developed [30, 33, 41, 51, 120, 121]. All but the work of Cano et al. seek to minimize the number of open sectors required to manage traffic over a period of time, which is not an appropriate objective for tac- tical area configuration decisions, when the objective is to ensure safe and efficient operations with the available resources. Furthermore, of these algorithms, only those developed by Tien consider area resources other than airspace (such as operating po- sitions or workstations). Tien proposed a mixed-integer programming problem and solution method for suggesting airspace configurations that minimize the predicted or expected value of the number of operating positions [120, 121]. His approach uti- lizes a statistical model that estimates the probabilities that one or two operating positions will be allocated to each open sector, given the characteristics of the traffic in the open sector. An extension of this problem can also enforce requirements on the length of time between changes to open sectors. This work doesnotattempt to simultaneously optimize operating position allocations or workstation assignments along with airspace configurations because it ignores workstations, it treats operating position allocations (given the open sectors) as an exogenous random process outside of the control of the optimization, and it handles reconfigurations by adding con- straints rather than by adjusting the objective. Furthermore, even though they are all based upon incomplete and imperfect models, none of the proposed algorithms provide multiple advisory options for the supervisor to consider. This might be prob- lematic because unmodeled or imperfectly-modeled aspects of area operations may make what an algorithm considers to be an optimal advisory unacceptable for imple- mentation. CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 15

We propose to address these limitations of previous research by 1) developing a decision model that simultaneously considers airspace, operating position, and work- station configurations; 2) utilizing an objective function that considers how safe and efficient operations are impacted by both the match between configurations and traffic as well as changes in configurations; and 3) developing solution algorithms that can present the supervisor with a set of diverse advisories that all perform well according to the objective function. The decision model does not enforce restrictions on the time period between changes to open sectors, as Tien proposes in [120], but rather imposes a traffic-dependent cost on configuration changes. This cost on configuration changes ensures that configuration changes are only proposed when they generate sufficiently safer and more efficient operations by producing open sectors with traffic levels that keep controller personnel engaged but not overworked. Diversity in the set of proposed advisories will increase the likelihood that one of the advisories will perform well with respect to all aspects of the area configuration problem, even those that are unmodeled or imperfectly modeled [97]. This chapter begins with a more detailed description of area configurations in Sec- tion 2.2. Then, Section 2.3 discusses the prescriptive decision model that we developed for selecting a single configuration schedule advisory. It is followed by a presentation of a lowest-cost path algorithm that we use to solve the model in Section 2.4. We used expert feedback, operational data, and fast-time simulations involving this algorithm to determine default values for parameters in the objective of the model; these efforts are documented in Section 2.5. Section 2.6 presents the impact of variations in an im- portant objective function parameter on some operationally-meaningful metrics, and compares the metrics for algorithm-generated and corresponding historical configu- rations. Next, in Section 2.7 we motivate and describe an optimization problem that uses the decision model to request multiple good and distinct configuration schedule advisories. Four algorithms that can be applied to this multiple-advisories problem are motivated, specified, and discussed in Section 2.8. Then, Section 2.9 describes how we quantified the performance of the algorithms using fast-time simulations based on operational data. One of the algorithms was implemented in a decision-support tool that was evaluated in a human-in-the-loop experiment. We summarize relevant results CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 16

from that experiment in Section 2.10. Finally, Section 2.11 reviews our contributions and main findings. The research covered in this chapter was originally published in a sequence of papers and articles [23, 24, 27–29]. While several people were involved in this line of research, the author of this dissertation made major contributions to the research described in and the writing of these papers and articles, and is the first author on each one.

2.2 Background

2.2.1 Area of Specialization Configurations

Airspace is partitioned into predefined volumes called sectors to facilitate of responsibilities between air traffic controllers. An airspace configuration maps a set of sectors to a set of open sectors such that each sector is assigned to exactly one open sector. A team of air traffic controllers, staffing one to three operating positions, monitors each open sector. At a minimum, a radar (also known as R- side) operating position is allocated to each open sector. A radar associate or data (also known as D-side) operating position can also be allocated to an open sector. Although rare, a third operating position can be allocated to an open sector. When more operating positions are allocated to an open sector, the tasks associated with controlling traffic in the open sector are divided among more controllers. An operating position configuration specifies how many operating positions are allocated to each open sector in the corresponding airspace configuration. Furthermore, each open sector is monitored from a particular workstation consisting of seats for air traffic controllers, a radar scope, plugs for headsets, and other equipment used by controllers to monitor traffic. Which workstation is utilized to monitor an open sector can influence how much work is involved when the open sector is changed by adding or removing sectors from it. For example, suppose an open sector consisting of two sectors has 15 aircraft in it, but that 12 of the aircraft are in one sector and 3 are in the other. Furthermore, suppose a reconfiguration is to occur in whichoneofthesesectors CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 17

will be removed from the open sector and be assigned to its own open sector, operated from a different workstation. There are two choices: either transition the sector with 12 aircraft to the new open sector at the other workstation or transition the sector with 3 aircraft to the new open sector at the new workstation. Transitioning the sector with just 3 aircraft to the new open sector at the other workstation requires fewer tasks and is less disruptive. Therefore, the choice of workstation for these two open sectors can impact the safety and efficiency of corresponding air traffic operations. A workstation configuration specifies which workstation is utilized for monitoring each open sector in a corresponding airspace configuration. Together, a set of corresponding airspace, operating position, and workstation configurations will be referred to simply as an area configuration. For example, the shapes of the five sectors in area 4 of Cleveland Air Route Traffic Control Center (ZOB) as of 20 October 2011 are shown in Fig. 2.1(a). The shapes of the open sectors in a sample airspace configuration are shown in Fig. 2.1(b) and the floor layout of corresponding operating position and workstation configurations is shown in Fig. 2.1(c). The airspace configuration contains four opensectors.Three of these open sectors consist of airspace from only a single sector (ZOB45, ZOB46, and ZOB48). These three open sectors are each allocated two operating positions (indicated by the number in parentheses in Figs. 2.1(b) and 2.1(c)). The fourth open sector consists of the combined airspace of sectors ZOB47 and ZOB49 and is controlled by a single operating position. In Fig. 2.1(c), the two workstations on the left side are used by the four operating positions allocated to the open sectors consisting of ZOB45 and ZOB46. The workstation at the top of the right side is usedbytheR-and D-side operating positions allocated to the open sector consisting of ZOB48. Finally, the single R-side operating position controlling the open sector consisting of ZOB47 and ZOB49 is using the bottom workstation on the right side. This specification of area configurations is incomplete. The main missingcom- ponent is a mapping of available controllers to operating positions. This component of configurations is excluded from the decision model because factors that influence this component, such as controller skill, fatigue, and personality, may be difficult to quantify for use in a decision-support tool. This component of the configuration is CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 18

ZOB46 ZOB45

ZOB47 ZOB49

ZOB48

(a) Sectors.

[46](2) [45](2)

[47, 49](1)

[48](2)

(b) Sample airspace configuration.

D-side [48](2) R-side

D-side [45](2) R-side

[47, 49](1) [46](2)

(c) Sample operating position and workstation con- figurations.

Figure 2.1: Sectors and sample configuration of ZOB area 4. CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 19

left for the supervisor to determine without the assistance of an advisory. However, by providing multiple configuration advisories, instead of just a single advisory, we hope to increase the likelihood that the supervisor will be presented with an option that performs well enough with respect to this unmodeled aspect of configurations as well as other imperfectly-modeled aspects (see Section 2.7).

2.3 Decision Model

In this section we will describe the decision model, which can be translated into a lowest-cost path problem on a time-extended graph.

2.3.1 Configuration Schedule Advisory Problem

The Configuration Schedule Advisory (CSA) problem specifies a lowest-cost path problem on a time-extended graph. The solution of this problem corresponds to an area configuration schedule advisory.

Decision Variables

The time horizon of the schedule is broken into K + 1 discrete configuration time steps k =0, 1, 2,...,K of length ∆minutes. The configuration time step k = 0 is used for data describing the state of the area at the time an advisory is generated.

The decision variables that make up a configuration schedule advisory C are Ck for k ∈{0, 1, 2,...,K},whereCk is the advised configuration at configuration time step k. More concretely, a configuration schedule advisory for configuration time step k is A OP W A Ck = {Ck ,Ck ,Ck } and it consists of an airspace configuration Ck , a corresponding OP operating position configuration Ck , and a corresponding workstation configuration W Ck . For a given set of airspace sectors S = {s1,s2,...,s|S|} under consideration, an A airspace configuration consists of a set of open sectors Ck = {σ1,σ2,...,σ A }.Each |Ck | A open sector σ ∈ Ck is itself a set consisting of at least one sector from S. An operating OP position configuration Ck is a mapping that specifies whether one or two operating positions are allocated to each open sector in the corresponding airspace configuration. CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 20

The formulation can be extended to allow the allocation of three operating positions to an open sector, but in this dissertation we only allow the typical values of one or W two operation positions per open sector. Finally, a workstation configuration Ck is A a mapping from open sectors in Ck to the set of available workstations W .

Data

The traffic situation data T is a set consisting of a data element Tk for each configura-

tion time step k ∈{0, 1,...,K}.Furthermore,eachTk contains the traffic situation

s1 s2 s|S| data for each sector during configuration time step k: Tk = {Tk ,Tk ,...,Tk }.Gen- erally, this traffic situation data must contain any predicted air traffic data required to compute the problem objective function. Although many other objective func- s tion formulations are possible, the function specified in Section 2.3.1 requires that Tk contain a unique identifier for each flight in sector s at each traffic time step during configuration time step k. Since air traffic characteristics often change faster than airspace configurations, we further discretize time into traffic time steps of length δ minutes (where δ ≤ ∆). Let τ(k) be the set of traffic time steps in configuration time s step k.TheneachTk is itself a set containing the traffic situation data in sector s at s s s each traffic time step t ∈ τ(k): Tk = {Tt }t∈τ(k). Finally, each Tt contains a unique identifier for each aircraft located within s during t. Other data required for the objective function are the capacities of open sectors. The capacity of an open sector is the maximum number of aircraft that can safely be within the open sector simultaneously when the open sector is allocated two operating positions. An open sector Monitor Alert Parameter (MAP) is used as a capacity bound in current air traffic operations and so MAP values are used as the required sector capacity data.

Constraints

A configuration schedule advisory C must be in the set C of all valid configuration schedules. Although C could be defined more generally, here it is specified as a set K of valid configurations at each configuration time step: C = {Ck}k=0.ThesetC0 CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 21

contains only the current configuration C0. We typically utilize historical data to help construct C, but there may be valid configurations that could be included in C even if they have not been used historically. To ensure that C is appropriate, subject-matter experts should be consulted.

Valid configurations in Ck must fulfill several fundamental requirements that apply to any problem instance and any configuration time step. For example, open sectors must be spatially contiguous. Airspace configurations at each configuration time step must assign each sector to exactly one open sector. Only one or two operating positions can be allocated to any open sector. Each open sector must be assigned to a single workstation, and a workstation cannot be assigned to multiple open sectors. Valid configurations can also be specific to certain problem instances and may apply for all or only a subset of configuration time steps. For example, configurations containing certain open sectors might be denoted as invalid because they are geo- graphically too large to be displayed with appropriate resolution on a scope. Other configurations might be invalid for some period of time due to temporary worksta- tion equipment outages. More permanent technological limitations, such as radio frequency coverage issues, may also limit the set of valid configurations. Training sessions may require that certain open sectors be a part of any configuration utilized for certain configuration time steps. The number of available controller personnel can impose an upper bound on the number of operating positions that can be used in a configuration. This list is not exhaustive: any configuration can be removed from consideration during any configuration time step. Furthermore, a simple extension of this decision model would allow transitions between configurations that are deemed impossible to be removed from consideration.

Objective

The problem objective is to minimize a configuration schedule cost g(C, T). The cost for a configuration schedule advisory is a sum of the costs incurred by the scheduled CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 22

configuration at each configuration time step in the time horizon:

K

g(C, T)= gk(Ck−1,Tk−1,Ck,Tk). (2.1) k !=1 For a single configuration time step, the cost is a weighted sum of a static cost and a reconfiguration cost :

S R R gk(Ck−1,Tk−1,Ck,Tk)=gk (Ck,Tk)+β gk (Ck−1,Tk−1,Ck,Tk), (2.2) where βR is the reconfiguration weight. The static cost penalizes configurations that do not facilitate safe and efficient operations during the configuration time step, such as those that require controllers to control excessively high or low amounts of traffic. The reconfiguration cost penalizes changes in configurations that occur between con- figuration time steps, particularly those that require controllers totransfercontrolof many aircraft from one workstation to another. More detailed descriptions of the static and reconfiguration costs are provided next. These cost functions are complex and involve many parameters. Complex- ity and parameters were only added to the cost functions when subject-matter ex- pert feedback indicated that simpler versions were not sufficient for producing useful configuration advisories. A simpler objective function used for our initial work is described in [29]. Static Cost The static cost penalizes configurations with too much or too little traffic in open sectors. Too much traffic can impair the ability of controllers to provide safe and efficient control, and too little traffic can lead to controllers that are not sufficiently engaged to provide safe and efficient control. The term “static” is used because this cost is associated with periods when the configuration is static, although of course the traffic changes during these periods. It is the sum over all the open sectors of a CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 23

static cost computed for each open sector:

S S,OS OP gk (Ck,Tk)= gk (σ,Ck (σ),Tk), (2.3) σ CA !∈ k

S,OS OP OP where gk (σ,Ck (σ),Tk) is the static cost for a single open sector σ allocated Ck (σ) operating positions at configuration time step k while experiencing traffic situation

Tk. Furthermore, the static cost for a single open sector at a configuration time step S,OS is itself a sum of a single traffic time step cost gt over all the traffic time steps in the configuration time step. The static cost for a single open sector during a single traffic time steptakeson different forms depending on the number of operating positions allocated to the open S,OS,1OP sector. The function gt (σ,Tt) is the static cost for an open sector σ that is

allocated one operating position at traffic time step t with traffic situation Tt.The S,OS,2OP corresponding function gt (σ,Tt) is the static cost for an open sector σ that

is allocated two operating positions at traffic time step t with traffic situation Tt. These one- and two-operating position static cost functions have identical forms but different parameter values. The functions depend entirely on the open sector load

'(σ,Tt), which is computed as the number of aircraft in the open sector divided by the MAP value of the open sector. Each function penalizes open sector loads that are too high or too low to facilitate safe and efficient operations in an open sector. The one-operating position function is

1OP γ1OP γ S,OS,1OP 1OP 1OP 1OP 1OP gt (σ,Tt)= α θ − '(σ,Tt) + α '(σ,Tt) − θ + + & ) " # $ % ' ( (2.4) and the two-operating position function is identical except that it uses different pa- rameters. Here [a]+ evaluates to a if a ≥ 0 and to 0 if a<0. The twelve parameters in these two cost functions are

• α1OP and α2OP: one- and two-operating position low load weights,

• θ1OP and θ2OP: one- and two-operating position low load thresholds, CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 24

• γ1OP and γ2OP: one- and two-operating position low load exponents,

• α1OP and α2OP: one- and two-operating position high load weights,

1OP 2OP • θ and θ : one- and two-operating position high load thresholds,and

• γ1OP and γ2OP: one- and two-operating position high load exponents.

Figure 2.2 contains plots of the static cost for a single open sector during a single traffic time step when allocated one and two operating positions.

10

8 Two−operating position 6 static cost

Cost 4 One−operating position static cost 2

0 0% 20% 40% 60% 80% 100% 120% Open Sector Load

Figure 2.2: Traffic time step static cost for an open sector.

Reconfiguration Cost The reconfiguration cost penalizes reconfigurations, especially reconfigurations that are likely to induce a significant amount of effort for the controllers involved. The reconfiguration cost is the sum of two different reconfigurationcosts:

R R,OP R,W gk (Ck−1,Tk−1,Ck,Tk)=gk (Ck−1,Tk−1,Ck,Tk)+gk (Ck−1,Tk−1,Ck,Tk). (2.5)

These types of reconfiguration costs are the reconfiguration operating position cost R,OP R,W (gk )andthereconfiguration workstation cost (gk ). R,OP The reconfiguration operating position cost gk penalizes changes in the number of operating positions allocated to an open sector when the sectors assigned to the CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 25

open sector do not change. When a D-side operating position is addedtoanopen sector, certain responsibilities associated with the aircraft in the open sector must be transferred from the R-side operating position to the incoming D-side operating position. Conversely, when a D-side operating position is removed from an open sector, these responsibilities must be transferred from the D-side operating position back to the R-side operating position. The reconfiguration operating position cost attempts to quantify and penalize these efforts, which may distract controllers from safely and efficiently managing aircraft. It is a sum over costs for all open sectors experiencing changes in the number of operating positions but no changes in airspace, and it differentiates between open sectors gaining and losing operating positions. The R,OP+ reconfiguration operating position gain cost gk penalizes effort associated with the addition of a second (D-side) operating position and the reconfiguration operating R,OP− position loss cost gk penalizes effort associated with the removal of a second R,OP+ (D-side) operating position. The form of gk is

R,OP+ R,OP+,O R,OP+,T s gk (σ,Tk−1,Tk)=β + β ∪s∈σ, Tt (2.6) * t∈ψR,OP * * ± * * * * * R,OP− * * and the form of gk is identical but with different parameters. The reconfiguration operating position gain overhead and loss overhead weights βR,OP+,O and βR,OP−,O penalize the overhead work associated with adding or removing a D-side operating position from an open sector, respectively. Overhead work refers to work that is independent of the number of aircraft in the open sector, such as describing active special-use airspace. Finally, the reconfiguration operating position gain transfer and loss transfer weights βR,OP+,T and βR,OP−,T are multiplied by aircraft counts to penal- ize the aircraft transfer work associated with adding or removing a D-side operating position from an open sector, respectively. Transfer work refers to work that results from transferring responsibilities associated with monitoring an aircraft from one op- erating position to another, such as indicating that an aircraft has been cleared to R,OP climb to a particular altitude. The set ψ± is a set of traffic time steps surrounding the reconfiguration happening between configuration time steps k − 1andk. It is CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 26

R,OP R,OP R,OP R,OP expressed as ψ± = {(k − 1)D +1− +− ,...,(k − 1)D + ++ }, where +− ≥ 0 R,OP and ++ ≥ 1 are parameters that determine the number of traffic time steps used to count the number of unique aircraft involved in the reconfiguration. R,W The other type of reconfiguration cost is the reconfiguration workstation cost gk . When the sectors that make up an open sector change, control of sector airspace and any aircraft within it must move from operating position(s) at one workstation to operating position(s) at another workstation. There is overhead and transfer work associated with this type of reconfiguration. Furthermore, this transfer can be even more difficult when the operating positions giving and receiving responsibility for airspace and aircraft are already busy monitoring other “background” aircraft that are not being transferred. Finally, there is work associated with moving the operating positions associated with an open sector from one workstation to another, even when the open sector airspace and number of allocated operating positions do not change. The reconfiguration cost attempts to quantify and penalize these types of work, and it is the sum of four terms:

R,W R,W,O R,W,T gk (Ck−1,Tk−1,Ck,Tk)=gk (Ck−1,Tk−1,Ck,Tk)+gk (Ck−1,Tk−1,Ck,Tk) (2.7)

R,W,B R,W,M + gk (Ck−1,Tk−1,Ck,Tk)+gk (Ck−1,Tk−1,Ck,Tk).

R,W,O The first term is the reconfiguration workstation overhead cost gk . It penalizes the overhead work associated with setting up and deploying new open sectors:open sectors that were not used in the configuration in the previous configuration time step. The form of this cost is simply a reconfiguration workstation overhead weight βR,W,O multiplied by the number of new open sectors in the configuration. Therefore, the reconfiguration workstation overhead weight is a cost per new open sector. The second type of work that makes up the reconfiguration workstation cost is R,W,T the reconfiguration workstation transfer cost gk . It penalizes work associated with transferring aircraft from operating position(s) at one workstation to operating position(s) at another workstation, as quantified by a per-aircraft reconfiguration workstation transfer weight βR,W,T multiplied by the number of aircraft transferred. CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 27

R,W The set of traffic time steps ψ± surrounding the reconfiguation at the start of R,W R,W configuration time step k is expressed as ψ± = {(k −1)D +1−+− ,...,(k −1)D + R,W R,W R,W ++ }, where +− ≥ 0and++ ≥ 1 are parameters that determine the number of traffic time steps used to count the number of unique aircraft involved in the reconfiguration. Transferring airspace and aircraft between operating position(s) at different work- stations is particularly difficult when the operating position(s) involvedarebusymon- itoring other background aircraft at the time of the transfer. The reconfiguration R,W,B workstation background cost gk penalizes the additional effort required due to the background aircraft. It is a per-aircraft reconfiguration workstation background weight βR,W,B multiplied by the number of aircraft that are monitored but not transferred by operating position(s) involved in transferring other aircraft. This cost also uses R,W the set of traffic time steps ψ± to count the number of unique aircraft involved in the reconfiguration. The fourth and final term in the reconfiguration workstation cost quantifies the work associated with moving control of an open sector from one workstation to an- other without making any other changes to the open sector. This reconfiguration R,W,M workstation move cost gk is expressed as a per-aircraft reconfiguration worksta- tion move weight βR,W,M multiplied by the number of unique aircraft that are in the R,W open sector airspace during the traffic time steps in ψ± .

Decision Model Summary

The CSA problem is

minimize g(C, T) (2.8)

subject to Ck ∈Ck,k=0, 1, 2,...,K. (2.9)

The CSA problem can be mapped to a lowest-cost path problem on a time-expanded graph. A sample portion of such a time-expanded graph is depicted in Fig. 2.3. Each valid configuration at a time step is a node in the graph and transitions between configurations in one time step to configurations in the next time stepareedges. CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 28

Figure 2.3: Portion of the graph for a sample CSA problem instance.

Node costs correspond to the static cost and edge costs correspond to the reconfigu- ration cost. The origin for the path is dictated by the initial configuration C0.Any configuration in CK is a valid destination for the path. If n is the largest number of valid configurations at any configuration timestep, then the time-expanded graph corresponding to a CSA problem instance has at most nK +1nodesandatmostn2K + n edges. For certain types of optimization problems (such as linear programs), instances with more constraints may require algorithms to perform more computations before finding a solution. This is not the case for the CSA problem. For CSA problem instances, additional constraints mean that more configurations (nodes) must be re- moved from the relevant time-expanded graph. This does add some complexity to the process of specifying the graph for a problem instance, but specifying an instance is typically much less computationally expensive than solving it. Furthermore, the computational burden of finding a lowest-cost path decreases as graph size decreases (i.e., as nodes and edges are removed), so additional constraints lead to CSA problem instances that require fewer computations to solve. CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 29

2.4 Solution Algorithm for CSA

Many algorithms can efficiently compute optimal and near-optimal solutions to the lowest-cost path problem [59]. We implemented and utilized the well-known A∗ al- gorithm to solve the CSA problem. For completeness, Appendix A specifies the ForwardA∗ algorithm, initially proposed by Hart et al. in [70], as applied to the CSA

problem. The algorithm can be run to construct an advisory startingfromC0 and working forward in time (as specified in Appendix A), or vice versa. The algorithm

makes use of Jˆk(Ck), an under-estimate of the minimum cost-to-go to a final configu-

ration from a configuration Ck. A priority queue open and a set closed are also used. † The configuration Ck−1(Ck) is the previous configuration in a partial advisory under consideration. This previous configuration may or may not be part of a minimum-cost

partial advisory. The J k(Ck) values are upper bounds on the minimum cost-so-far required to get from C0 to Ck.ThecostJ k(Ck) can be achieved by constructing the † partial advisory defined by the previous configurations returned by Ck−1(Ck)backto

C0. Furthermore, using an under-estimate for Jˆk(Ck)ensuresthatonceaconfigura-

tion is moved to closed, J k(Ck) for that configuration is actually the minimum cost % J k(Ck) for a partial advisory from C0 to Ck.

Since the cost for each time step is nonnegative (gk(Ck−1,Tk−1,Ck,Tk) ≥ 0) and

Jˆk(Ck) is an underestimate of the minimum cost-to-go, this algorithm returns a lowest- ∗ cost advisory. We use Jˆk(Ck) ! 0, so A is a version of Dijkstra’s algorithm that terminates as soon as a shortest path from C0 to a configuration in CK is found. Assuming a heap implementation of the priority queue structure, Dijkstra’s algorithm has computational complexity O((n2K + n) log(nK +1))= O(n2K log(nK + 1)) [83].

2.5 Default Decision Model Parameters

There are 25 parameters in the decision model cost function. This section describes efforts at finding default values for these parameters. CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 30

2.5.1 Static and Reconfiguration Cost Parameters

Default values for the static and reconfiguration cost parameters were selected based on descriptions of operating procedures and also discussions with and a survey of subject-matter experts. The survey contained 13 questions; four questions were re- lated to static cost parameters and nine questions were related to reconfiguration cost parameters. For example, some of these questions asked for estimates of the θ thresh- old parameters in the static cost. Others asked for estimates of the relative amount of work required to perform certain tasks involved in reconfigurations; these estimates could be used to derive various β parameters in the reconfiguration cost. Experts were also encouraged to provide comments. The survey was sent to nine subject-matter experts. All of the experts had some experience as an air traffic controller and many of them are currently or have been supervisors of an area, meaningthattheyhave made decisions about how to configure sectors, operating positions, and workstations. Completed surveys were returned by five of these experts and four of them answered every question. For certain parameters, default values were determined by simply averaging the expert responses on a particular question. In other cases, engineering judgment and insights from expert comments were used to derive default values.

Static Cost

The 12 parameters in the static cost are listed along with default values in Table 2.1. These are the parameter values that were used to generate the static cost curves in Fig. 2.2. Experts indicated that sector load levels between the θ thresholds are in a “sweet spot” in which controllers are typically engaged but not over-worked. Lower load levels do not facilitate safe and efficient operations because controllers may not be busy enough to stay focused, while higher load levels do not facilitate safe and efficient operations because controllers may be too busy to carefully control the traffic. The α weights and γ exponents were set to produce larger penalties for open sectors with excessive traffic levels than for open sectors with too little traffic. They were set so that an open sector allocated two operating positions at its MAP value would incur 1 CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 31

Table 2.1: Static Cost Parameters Parameter Name Default Value α1OP One-operating position high load weight 6.66 α1OP One-operating position low load weight 3.33 α2OP Two-operating position high load weight 10 α2OP Two-operating position low load weight 2.83 γ1OP One-operating position high load exponent 2 γ1OP One-operating position low load exponent 1.5 γ2OP Two-operating position high load exponent 2 γ2OP Two-operating position low load exponent 2 1OP θ One-operating position high load threshold 0.65 θ1OP One-operating position low load threshold 0.3 2OP θ Two-operating position high load threshold 0.9 θ2OP Two-operating position low load threshold 0.5

unit of cost per traffic time step. Furthermore, these parametersweresetsothatan open sector allocated a single operating position at 75% of its MAP value would also incur 1 unit of cost per traffic time step. Costs increase slowly (sub-linearly) to this point as open sector loads increase beyond the “sweet spot,” but higher load levels lead to fast (super-linear) growth in costs beyond 75% and 100% foropensectors allocated one- and two-operating positions, respectively. Finally, these parameters were set so that open sectors with no traffic at all would incur 1 and 2 units of cost per traffic time step in open sectors with one and two operating positions, respectively.

Reconfiguration Cost

The 12 parameters in the reconfiguration cost are listed along with default values in Table 2.2. Subject matter expert survey responses were primarily used to determine these parameter values. Most of the relevant survey questions asked for estimates of the relative difficulty of various types of configuration changes; these relative difficulties roughly correspond to ratios of various parameters. Parameter values were selected to Table 2.2: Reconfiguration Cost Parameters Parameter Name Default Value βR,OP+,O Reconfiguration operating position gain overhead weight 0.45 βR,OP−,O Reconfiguration operating position loss overhead weight 0.01 βR,OP+,T Reconfiguration operating position gain transfer weight 0.6 βR,OP−,T Reconfiguration operating position loss transfer weight 0.3 βR,W,B Reconfiguration workstation background weight 0.5 βR,W,M Reconfiguration workstation move weight 1.8 βR,W,O Reconfiguration workstation overhead weight 1 βR,W,T Reconfiguration workstation transfer weight 2 R,OP R,OP ++ Traffic time steps after a reconfiguration event included in ψ± 2 R,OP R,OP +− Traffic time steps before a reconfiguration event included in ψ± 0 R,W R,W ++ Traffic time steps after a reconfiguration event included in ψ± 2 R,W R,W +− Traffic time steps before a reconfiguration event included in ψ± 1 CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 33

be consistent with these survey-derived ratios. In addition to the survey results and discussions with experts, FAA procedures documents were used tobetterunderstand the steps involved in various configuration changes and to set parameter values ac- cordingly. Section 2-2-4 of the FAA Order 7210.3X concerning Facility Operation and Administration provides guidelines for the steps that are required during a transfer of position responsibility [57]. Appendix J of the Cleveland Air Route Traffic Control Center Standard Operating Procedures contains position relief briefing checklists that are used at that facility [56]. Any configuration change involves one or more position responsibility transfers. Moving airspace between workstations involves opening or closing R-side operating positions. Briefings associated with these processes are more involved than briefings related to opening or closing D-side positions. Therefore, parameters associated with changes in the number of operating positions (βR,OP+,O, βR,OP+,T, βR,OP−,O,and βR,OP−,T) are smaller than corresponding parameters associated with airspace moving from one workstation to another (βR,W,O and βR,W,T). Futhermore, removing a D-side operating positions is less difficult than adding a D-side operating position because when a D-side position is removed, the controller working in the corresponding R-side operating position already knows almost everything that the controller working on the closing D-side position knows. Therefore, parameters related to removing operating positions (βR,OP−,O and βR,OP−,T) are smaller than corresponding parameters related to adding operating positions (βR,OP+,O and βR,OP+,T). The “transfer” parameters, which are multiplied by the number of aircraft that are transferred in various reconfigurations (βR,OP+,T, βR,OP−,T,andβR,W,T), are larger than the “overhead” parameters, which contribute to the reconfiguration cost regard- less of the number of aircraft being transferred (βR,OP+,O, βR,OP−,O,andβR,W,O). Comments provided in survey responses indicate that the majority of the discussion in most reconfiguration briefings is dedicated to communicating attributes of aircraft, because aircraft states are more complicated than the state of non-aircraft elements like airspace, and because aspects of pairs of aircraft, such as conflicts, may also need to be communicated. In fact, feedback suggests that each individual transferred aircraft generates more reconfiguration effort than what is required for all of the CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 34

aircraft-independent items combined. However, aircraft that are controlled, but not transferred, when airspace is added or removed from an open sector do not substan- tially increase the effort involved in the reconfiguration briefing, leading to a value for βR,W,B that is less than βR,W,O and βR,W,T. Finally, based on survey feedback, the reconfiguration workstation move weight was set to a value less than the cor- responding workstation transfer weight but more than the workstation background weight. Although workstation moves are relatively rare, they usually happen when the controllers working in the open sector operating positions are replaced with differ- ent controllers. This would require a briefing similar to the briefing associated with sectors moving between workstations, providing some motivation for βR,W,M being almost as large as βR,W,T.

The + parameters, which specify the duration of the ψ± intervals, were set to create a 2-traffic time step interval for changes in the number of operating positions and a 3-traffic time step interval for workstation-related changes. These parameters were set assuming that the traffic time step length δ was 1 minute. A larger interval was selected for workstation-related changes because expert feedback suggests that these briefings typically take a longer period of time.

2.5.2 Reconfiguration Weight

The static and reconfiguration costs are competing objectives: one can typically be reduced by tolerating an increase in the other. The reconfiguration weight parameter βR determines the relative importance of these costs in the problem objective. A configuration schedule advisory generated with a high βR value will usually involve fewer or less disruptive reconfigurations than one generated with a lower value, but it will also usually involve open sectors that are over- or under-loaded for longer durations. We used data describing historical operations and fast-time simulations of the solution algorithm to find a default value of βR that captures the relative importance of these competing objectives for decision makers. More precisely, we measured the static and workload costs incurred by airspace configurations found in historical data. These costs were compared to the costs of advisories produced by the CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 35

algorithm for various values of βR We analyzed sector configuration and traffic data from ZOB area 4 for 230 non- weekend and non-holiday days selected from 20 October 2011 to 19 October 2012. We excluded weekends and holidays because they might involve low-volume or atypical traffic patterns, leading to configuration selections that are correspondingly atypi- cal. For each day, and for 14 values of βR, the algorithm generated advisories that specified configurations from 6 am to midnight local time. To approximate how such an algorithm might be used in operations, we utilized a rolling horizon technique. This technique begins by calculating an advisory for the first two hours, but only the first hour is implemented. One hour later, starting from the what is now the state of the area, a new two-hour advisory is calculated, and again only the first hour is implemented. This process continues until the end of the time horizon. The resulting static and reconfiguration costs were recorded for each day and for 14 values of βR along with the historical costs based on historical traffic and airspace configuration data. We only considered airspace and workstation configurations be- cause historical operating position data were not available. Static cost parameters were selected to produce a cost curve that is roughly halfway between the one- and two-operating position curves in Fig. 2.2. Also, the airspace configurations available to the algorithm only included the five most common historical airspaceconfigura- tions. In more than 99.99% of the time under consideration, the historical airspace configuration was selected from these five configurations. Figure 2.4 shows, for 1 May 2012, the costs incurred historically and by the al- gorithm for various βR values. Algorithm configuration schedule advisories costs are plotted with black dots and connected with a line that starts at the point correspond- ing to the smallest βR value and ends at the point corresponding to the largest βR value. The costs of the historical configurations used that day are indicated by the gray point. Since βR is a parameter that controls the relative importance of static and recon- figuration costs, it is appropriate to compare the ratio of static and reconfiguration costs produced by the algorithm to those produced by the corresponding historical configurations. We sought a value βR% that minimizes the difference between the CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 36

βR =0.5 200

150

100 Historical

50 R Reconfiguration Cost β =15 0 600 800 1000 1200 Static Cost

Figure 2.4: Cost trade-offcurve for 1 May 2012. historical and algorithm cost ratios for all days in the analysis:

R R R R% g (β ,d) ghist(d) β ∈ argmin S R − S , (2.10) R g (β ,d) g (d) β d∈D * hist * ! * * * * where gR(βR,d)andgS(βR,d) are the respective* reconfiguration* and static costs pro- R R S duced by the algorithm for day d with β ,andghist and ghist are the historical equiv- alents. The set D contains all of the days used in the analysis. As can be seen in the plot of the sum of this ratio error in Fig. 2.5, the value of βR that produces the minimum error is 1.75.

2.6 Reconfiguration Weight Parametric Study

To gain insight into the impact of the value selected for the reconfiguration weight, we conducted a parametric study. In this study, the fast-time simulations described in Section 2.5.2 were repeated, but with three changes. First, only five βR values were investigated: 1, 1.5, 1.75, 2, and 2.5. These values are all near the error-minimizing value of 1.75 found in that section. Second, one historical non-weekend and non- holiday day that had been eliminated from the set of days used in Section 2.5.2 was CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 37

25

20

15

10 Summed Ratio Error 5 0 1.75 5.0 10 15 βR

Figure 2.5: Summed cost ratio error versus βR. included in this study. This was due to a third change, which was to add the con- figuration used on that day and several others to the five that were considered in that analysis. The algorithm was permitted to use a set of 16 airspaceconfigura- tions in this analysis. Subject-matter experts indicated that each of these airspace configurations could be implemented, even if some of them had rarely or never been utilized historically. The algorithm was given this additional flexibility because this analysis does not involve the sort of explicit comparison of advisories and historical configurations that were used in selecting a default value for βR in Section 2.5.2. The purpose of this analysis, on the other hand, was to determine the impact of changing the value of βR on operationally-meaningful metrics when any implementable airspace configuration is available to the algorithm. The first operationally-meaningful metric quantifies the amount of time that the open sectors experienced different levels of traffic. Specifically, the metric measures the amount of time that the traffic load for the open sectors falls intooneofthree ranges: below, in, or above the zero-cost region of the static costcurve.Thetwo curves in Fig. 2.2 depict the static cost, at different traffic loads, for a single open sector when it has been allocated either one or two operating positions. The zero cost load regions were between 30% and 65% when an open sector was allocated one operating position and between 50% and 90% when it was allocated two operating CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 38

positions. In this study, a static cost curve between these two curves was used because historical operating position data were not available; the zero cost load region for this new curve included open sector loads between 40% and 77.5%. At any given minute, the configuration in place specifies some set of open sectors. At this minute, each of these open sectors is experiencing some level of traffic that leads toanopensector load that is either below, in, or above the zero-cost region of the static cost curve. By keeping track of how many open sectors experience loads below, in, and above this region at each minute, the percent of open sector–minutes spent below, in, and above the region over the 231-day data set can be computed. Large percentages of open sector–minutes in the zero-cost region indicate that the configurations are maintaining open sector loads that facilitate safe and efficient operations. Figure 2.6 shows that these open sector-minute percentages are not sensitive to changes in βR. The advisories produce open sectors that spend 30%–35% of open sector–minutes below, 60%–63% in, and around 6% above the zero-cost region. The historical configurations spend a much larger percentage of the open sector–minutes (70%) below the region and smaller percentages of open sector–minutes in and above the region (29% and 1%, respectively). It is not clear why historical open sectors tend to experience loads that are lower than the preferred load levels indicated by subject- matter experts, but this may be related to operational constraints such as low traffic levels relative to the number of controllers available to staffoperating positions. The second operationally-meaningful metric investigated is the duration of open sector instances. An open sector instance is an open sector that is used in airspace configurations for some duration of time. Each time a reconfiguration changes the airspace configuration, there is at least one open sector present in the new airspace configuration that was not present in the old configuration. This open sector will persist for some period of time, potentially even as other open sectors are changed by later reconfigurations. Eventually, this open sector will no longerbeusedbya later airspace configuration. The time that the open sector is in use is the dura- tion of the open sector instance. Creating and terminating open sector instances requires some effort, so it is generally preferable for open sector instances to have long durations. Open sector instances with durations less than 60 minutes can be CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 39

100%

80%

60%

40%

20% Above In

Percent of Open Sector − Minutes Below 0% 1 1.5 1.75 2 2.5 Historical βR

Figure 2.6: Distribution of open sector–minutes versus βR. particularly disruptive. Figure 2.7 (a) shows the cumulative distributions of open sector instance durations for advisories generated with the five values of βR and also for the historical configurations, and Fig. 2.7 (b) shows the same distributions but only for instances with durations in the range of zero to 60 minutes. As expected, lower values of βR generated more open sector instances with shorter durations. The historical distribution falls somewhere in-between the distributions corresponding to advisories generated with the five βR values, except for durations of 30 minutes or less. For example, there were only 11 historical open sector instance durations of 15 minutes or less, but there were between 37 and 63 open sector instance durations of 15 minutes or less in the advisories (depending on the value of βR). This corresponds to only one such instance every three or four days on average, but this advisory be- havior may still be undesirable. This behavior may be alleviated in cases where the algorithm is given the flexibility to change the number of operating positions assigned to each open sector. Furthermore, these short-duration open sector instances are less disruptive when traffic is light, but traffic levels are not captured in this metric, so the disruption they induce may not be severe. CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 40

2500

2000

1500 βR =1 1000 βR =1.5 βR =1.75 R 500 β =2 βR =2.5 Historical Number of Open Sector Instances 0 3 6 9 12 15 18 Hours (a) Durations up to 18 hours.

700 βR =1 600 βR =1.5 βR =1.75 500 βR =2 βR =2.5 400 Historical 300

200

100

Number of Open Sector Instances 0 0 15 30 45 60 Minutes (b) Durations up to 60 minutes.

Figure 2.7: Cumulative distributions of open sector instance durations. CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 41

2.7 Multiple Advisories Optimization Problem

In this section, we define and then motivate an optimization problem that is based upon our decision model (the CSA problem). This problem requests multiple advi- sories instead of just one configuration schedule advisory.

2.7.1 Mε-Optimal d-Distinct Configuration Schedule Advi- sories Problem

In this section, we define an optimization problem based on our prescriptive decision model (the CSA problem) that requests a set of M valid advisories. The first advisory must be optimal for the corresponding CSA problem. Each other returned advisory must achieve a cost value that is within some fraction of the minimum cost value. Finally, the advisories in each pair of returned advisories must be sufficiently different from each other according to an advisory difference metric. The Mε-Optimal d-Distinct Configuration Schedule Advisories (M-ε-d-CSAs) problem is

M minimize g(Cm,T) (2.11) m=1 ! subject to |CM | = M (2.12) m Ck ∈Ck,k=0, 1, 2,...,K, m=1, 2,...,M (2.13) 1 % C ∈CCSA(C,T) (2.14) g(Cm,T) − g(C1,T) ≤ εm=2, 3,...,M (2.15) g(C1,T) ! Φ(Cm,Cm ) ≥ d ∀m, m$ ∈{1, 2,...,M},m=' m$. (2.16)

M 1 M Here C = {C ,...,C } is the set of M ∈ Z++ advisories that make up a solution to the problem. The objective (2.11) is to minimize the sum of the costsoftheM advisories. Constraint (2.12) requires that M advisories be returned. Although not required by the problem statement, in the event that M feasible advisories cannot be found, the algorithms we devise and investigate will return a set containing fewer CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 42

% advisories. Constraint (2.13) ensures that each advisory is valid. The set CCSA(C,T) ⊆ C is the set of solutions to a CSA problem instance when the set of valid configuration schedules is C and the traffic situation data is T . Therefore, constraint (2.14) requires that the first advisory be an optimal advisory for the corresponding CSA problem. Constraint (2.15) requires that each of the other advisories achieves a cost that does not exceed a particular value. More precisely, the cost of each other advisory in excess of the minimum advisory cost (g(Cm,T) − g(C1,T)), expressed as a fraction of the minimum advisory cost g(C1,T), must not exceed an excess cost fraction bound ε ∈

R+. Finally, constraint (2.16) requires that each pair of advisories, when compared

using an advisory difference metric Φ: C×C→R+, achieves a difference of at least

d ∈ R+. The M-ε-d-CSAs problem is NP-complete. This is shown in Appendix B by demonstrating that the Independent Set (IS) problem, which is NP-complete and even difficult to approximate [123], is polynomial-time reducible to a decision version of M-ε-d-CSAs. A solution to the decision version of M-ε-d-CSAs can be efficiently certified. Together, these two facts mean that M-ε-d-CSAs is NP-complete [83].

Advisory Difference Metric

The advisory difference metric Φmaps a pair of advisories from C to a non-negative real number. It could take many forms, but in this work it is defined as

K $ $ Φ(C, C )= φ(Ck,Ck), (2.17) k !=1 $ R where φ : Ck ×Ck → + is a configuration difference metric. It defines a difference between valid configurations. The configuration difference metric is

1 if CA =' C$A φ(C ,C$ )= k k (2.18) k k  0 else.

In other words, pairs of configurations that use different airspace configurations (sets CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 43

of open sectors) achieve a configuration difference of 1, while all other pairs of config- urations achieve a configuration difference of 0. Therefore, when using this advisory difference metric, constraint (2.16) requires that the two advisories in each pair of returned advisories utilize different airspace configurations during at least d time steps.

2.7.2 Motivation

Related Research

To clarify our contributions and to further motivate our approach for finding multiple advisories, we will describe some related research. In the context of “planning” in the artificial intelligence domain, Nguyen et al. have investigated situations in which user preferences are unknown or only a distribution over user objective function parameter values is provided [97]. Neither of these sit- uations exactly describes the situation we investigate, but this work helps motivate the M-ε-d-CSAs problem statement. In general, Nguyen et al. propose presenting the user with options and then allowing the user to resolve uncertainty in the ob- jective function by selecting a solution from the set of options. Similarly, implicit in the M-ε-d-CSAs problem statement is the assumption that the user will resolvethe impact of unmodeled or imperfectly-modeled components of the area configuration problem by selecting an advisory option that performs well enough with respect to these components. More specifically, when user preferences are completely unknown, Nguyen et al. suggest presenting the user with a diverse set of plans. If diversity is defined ap- propriately, diverse plans are less likely to be equally preferred by users, so a diverse set of plans increases the chance that one of the plans in the set will be acceptable to the user. In the area configuration context, user preferences are not completely unknown and various objective functions have been defined [27–30, 33, 41, 51, 121]. However, even the relatively complex objective function that we have defined in Sec- tion 2.3 does not fully capture the relationship between configurations and safe and CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 44

efficient traffic operations. Therefore, rather than striving only for diversity, the M- ε-d-CSAs problem statement requests a set of solutions that is both diverse and in which each plan performs well with respect to the objective function. The hope is that the diverse plans will achieve a variety of levels of performance with respect to unmodeled or imperfectly-modeled components of the problem so that at least one will perform well enough to be useful in operations. The search for multiple low-cost and diverse paths has also been investigated in the operations research literature. Motivated by the “compromises” or “approximations” present in any mathematical model, Bellman and Kabala studied the problem of finding a given number of paths with the lowest combined cost [20]. This lowest-cost paths problem has received considerable attention in the literature [20,52,53,72,78,94], but simply searching for the lowest-cost paths without considering the diversity of the paths might lead to similar paths. For the reasons discussed earlier, we seek a set of near-optimal paths that are also diverse. For certain special definitions of diversity, the problem of finding a low-cost set of diverse paths has been studied [12,13,90,91, 101,118,119]. For example, Suurballe proposes an algorithm that searches for low-cost paths that share no nodes [118]. The motivation provided in this body of research is often related to robustness—a path from the set should remain available even if some set of nodes or links fails. Unfortunately, subject-matter expertfeedbacksuggeststhat none of the definitions of diversity used in this body of research correspond to the type of diversity that matters for area configurations advisories (see Section 2.7.1). This is not surprising because, although supervisors would likely benefit from advisories that are robust to the uncertainty in the cost that results from uncertainty in future air traffic, we are not concerned here with finding robust advisories. Figure 2.8 is a notional visualization of one dimension of the space of problems involving finding a set of M paths. Problems concerned with finding low-cost paths, such as the lowest-cost paths problem, are at one end of the spectrum in this di- mension. Problems concerned with finding distinct paths are at the other end of the spectrum. The distinct paths problem, which seeks a given number ofpathsthat share no nodes without concern for path costs, is at this extreme of the spectrum. The M-ε-d-CSAs problem lies somewhere in the middle of the spectrum. It seeks CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 45

M lowest-cost paths M-ε-d-CSAs M distinct paths

concerned with concerned with low-cost paths distinct paths

Figure 2.8: One dimension of the space of problems involving finding M paths. low-cost paths (advisories), but at the same time requires that each returned path achieves a certain level of difference from the other returned paths. Efficient algo- rithms have been specified for some problems in this space, mostly at the ends of this spectrum, but we show in Appendix B that the M-ε-d-CSAs problem, located somewhere in the middle, is NP-complete. One algorithm from the literature that we utilized was proposed by Byers and Waterman [40]. This dynamic programming-based algorithm finds all paths that achieve a cost within some fraction of the minimum cost of a path. Once this set is known for a given excess cost fraction bound ε, we can solve the M-ε-d-CSAs problem by searching through the set for the M lowest-cost paths that meet the other problem constraints. Although computationally expensive, this approach can serve as a benchmark: it enables us to study how well more computationally-efficient heuristics perform on some small problem instances.

Decision Model and Solution Algorithm Issues

Any decision model is only an approximation of reality and no mathematical optimiza- tion problem statement perfectly captures a decision faced in reality. Compromises must always be made to arrive at a tractable problem for which a solution can be found in a reasonable amount of time. There are three major known issues related to our decision model, the CSA problem, and the algorithm used to solve it—the con- figuration model is incomplete, the CSA problem objective function is imperfect, and algorithmic considerations imply that enforcing a certain desirable type of constraint on advisories would reduce the likelihood of finding a solution in a reasonable amount of time. Each of these issues will now be discussed in detail. CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 46

First, as was alluded to in Section 2.2, our model of area configurations is incom- plete. The main missing component is a mapping of available controllers to operating positions. This component of configurations is excluded from the problem statement because the influence of this component on the safety and efficiency of operations, which depends on controller skill, fatigue, preferences, and personality, is particularly difficult to quantify. Furthermore, controller breaks and controllertasksthatdonot involve controlling traffic (such as certain training activities) play an important role in this component of configurations. Such constraints would further complicate the model and require additional inputs. Finally, considering this component of configura- tions would increase the number of possible configurations considerably, which would potentially make finding a good advisory in a reasonable amount of time more diffi- cult. For these reasons, this component of the configuration is left for the supervisor to determine without the assistance of an advisory. Next, the CSA objective function (2.8) is lacking in a number of ways. First of all, it utilizes an incomplete configuration model and therefore cannot account for rele- vant components of area operations that are not captured in the model. For example, the model does not capture which human controller is assigned to which operating position. Secondly, the difficulty of quantifying controller workload [106] also makes it difficult to determine when traffic operations in an area would be safe and efficient. The static cost in the CSA problem objective function uses aircraft count divided by open sector MAP as a measure of the workload associated with controlling traf- fic. This measure of workload does not explicitly take into account many important factors, such as climbing and descending aircraft [85]. The reconfiguration cost pe- nalizes a different sort of controller workload: the effort associated with transitioning from one configuration to another. Relatively little is known about this type of work- load [79,132]. Furthermore, the number of operating positions allocated to each open sector impacts workload, but it is not obvious how to quantify this impact [120,122]. Thirdly, even if controller workload were known exactly, it is not clear what levels of controller workload facilitate safe and efficient traffic operations. Although the work associated with changing configurations seems to always hinder safe and efficient op- erations, too much or too little workload might lead to degraded safety and efficiency CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 47

when controlling traffic in-between configuration changes. The relative impact of the two types of controller workload on safe and efficient traffic operations, quantified in the CSA objective function by the βR parameter, is not precisely known. This βR parameter and many others in the CSA objective function could be further tuned. Lastly, although the CSA problem statement is capable of handling many con- straints that arise when configuring an area, the algorithm currently in use to solve the problem does not naturally handle one commonly-requested type of constraint: once an open sector has been generated by a configuration change, users often request that this open sector not be changed again for at least some period of time (approx- imately 15–30 minutes). Satisfying this constraint would involve either changing the decision model to keep track of this open sector duration or utilizing a different solu- tion algorithm. Changing the model would lead to a much larger set of possible solu- tions and longer algorithm run times, and we have not found a suitable algorithm to handle such constraints when they are enforced but the model is not changed. While appropriate tuning of the βR parameter can ensure that this constraint is usually met, the algorithm may still propose some advisories that violate it (see Section 2.6).

Acceptability Among Users

We consulted area supervisors, the target users of configuration schedule advisories, during the development of the model and problem statement. In May of 2012, nine supervisors or former supervisors provided input and feedback in atwo-dayworkshop at NASA Ames Research Center. We also made five visits to three FAA Air Route Traffic Control Centers, including three visits to Cleveland Center. Area supervisors requested that the lowest-cost advisory be presented along with the advisories achieving the second- and perhaps third-lowest costs. The reason for this request may be that they are aware of the issues with the model and problem statement listed previously. However, the supervisors did trust the model and prob- lem statement enough to request that the advisories be ranked according to the CSA objective function and to request that sufficiently sub-optimal advisories not be dis- played. Advisories that were considerably sub-optimal would rarely be selected over the optimal advisory. CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 48

Area supervisors have also requested that the multiple advisories be different. They indicated that two advisories are different if they use different airspace configu- rations for at least 30 minutes of a two-hour time period. The reason for this request may be that, given two similar advisories, it is unlikely that users would strongly prefer one over the other [97].

2.8 Solution Algorithms for M-ε-d-CSAs

Three algorithms that solve or attempt to solve the M-ε-d-CSAs problem have been developed. One of these, described in section 2.8.1, is a trivial extension to an al- gorithm proposed by Byers and Waterman [40]. It returns an optimal solution and serves as a benchmark. Section 2.8.2 describes a novel heuristic that is based on value iteration. A class of novel heuristics based on the A∗ algorithm [70] is described in Section 2.8.3, culminating in the specification of the third algorithm. In addition to these three algorithms, we studied the solution to the M lowest-cost paths relax- ation of M-ε-d-CSAs. This is described in Section 2.8.4. Finally, in Section 2.8.5, we summarize the computational complexity of these algorithms.

2.8.1 Value Iteration Fraction Optimal with Exhaustive Ad- visory Search

The Value Iteration Fraction Optimal with Exhaustive Advisory Search (VIFOEAS) algorithm is an extension of the algorithm proposed by Byers and Waterman for finding all paths with costs that satisfy a bound on the excess cost fraction [40]. VIFOEAS is specified in Algorithm 1. The first step is to use the well-known reverse value iteration algorithm to find optimal costs-to-go; it is specified in Appendix C for completeness. More concretely, it is denoted as ReverseVI(C) and it returns two ¯% mappings: (1) the minimum cost-to-go, Jk (Ck), from each valid configuration; and ¯% (2) the next configuration, Ck+1(Ck), in a minimum-cost partial advisory starting from each configuration. Next, a recursive implementation of the algorithm proposed in [40] is utilized to find all the valid advisories that achieve the excess cost fraction CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 49

bound constraint (2.15). We utilize a recursive function referred to as Recursive Value Iteration Fraction Optimal (RecursiveVIFO) to implement the algorithm proposed in [40] (see Appendix D). RecursiveVIFO returns the set Cε ⊆Cof advisories that are valid for the corresponding CSA problem instance and achieve costs within the fraction ε of the minimum cost. Next, by searching exhaustively through Cε to find the cost-minimizing M advisories that meet the other M-ε-d-CSAs constraints, the VIFOEAS algorithm solves the M-ε-d-CSAs problem.

Algorithm 1 Value Iteration Fraction Optimal with Exhaustive Advisory Search (VIFOEAS) Require: C,T,M,ε,d {M-ε-d-CSAs problem instance specification} ¯% K−1 ¯% K−1 {Jk (Ck)}k=0 , {Ck+1(Ck)}k=0 ← ReverseVI(C,T) ε ¯% K−1 ¯% C ← RecursiveVIFO(C, {Jk (Ck)}k=0 ,C0, (1 + ε) × J0 (C0), 0) M Jmin ←∞ M ε M 1 % for each advisory set C ⊆C such that |C | = M and C ∈CCSA(C,T) do m m! m M if min{Cm,Cm! ∈CM |m%=m!} Φ(C ,C ) ≥ d and Cm∈CM g(C ,T) ≤ Jmin then M m Jmin ← Cm∈CM g(C ,T) M M . Cmin ←C M . return Cmin

Since M-ε-d-CSAs is NP-complete, it is not surprising that finding the set of advi- sories that meet the cost constraint (2.15) and searching through these advisories for the M that solve the problem can both be computationally demanding. Buildingthe set of near-optimal advisories Cε involves first performing backwards value iteration and then a depth-first search. For a general graph, the computational complexity of value iteration is O((nK +1)(n2K + n)) = O(n3K2) [83]. However, the special time- expanded structure of the graph we are studying reduces the number of computations required. Value iteration in this case involves, for each of K time steps, performing a minimization over at most n values for at most n nodes, so its computational com- plexity is only O(n2K) (see Appendix C). In the worst case, the depth-first search algorithm will visit each node once along each of the n edges into the node, and each time the node is visited, proceed to each of the n other nodes that can be reached from the node. This implies n2 computations per node, and there are nK nodes (other 3 than C0), leading to a computational complexity of O(n K)forthedepth-firstsearch CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 50

and a total complexity of O(n2K + n3K)=O(n3K). Of course, the computational effort required for the depth-first search varies considerably from problem instance to problem instance, depending largely on the degree of near-optimality desired. The computational and memory efficiency of the depth-first search could be improved with a stack implementation, but we utilized this recursive implementation because it was simpler to implement. Once Cε has been determined, searching through it requires |Cε| the investigation of M subsets of advisories. This search can usually be computed more efficiently than/ the0 brute-force search described in Algorithm 1becausethe optimal advisory is typically unique. For example, the search is only linear in the size of Cε when the optimal advisory is unique and M = 2. Although implemented, these simpler and more efficient variations of the search through Cε are not precisely documented here.

2.8.2 Forward and Backward Value Iteration with Sequential Advisory Search

The Forward and Backward Value Iteration with Sequential AdvisorySearch(FBVISAS) algorithm is a heuristic that reduces the number of advisories to investigate by using minimum cost-to-go and minimum cost-so-far information. More precisely, it stores each node in a priority queue J , ranked from the smallest to the largest sum of the minimum cost-so-far and the minimum cost-to-go. The nodes are then searched in the order specified by the priority queue until M configuration advisories that meet the constraints are found. The specification of FBVISAS is in Algorithm 2. This algorithm makes use of the reverse value iteration algorithm specified in Appendix C as well as the forward value iteration algorithm, which is referred to as ForwardVI. ForwardVI is completely analogous to ReverseVI except that it operates forward in time through the configuration time steps. FBVISAS requires value iteration to be performed once in each direction, induc- ing a computational complexity of O(2n2K). Then, each of up to nK +1 nodes must be investigated and inserted into the priority queue (with a computational CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 51

Algorithm 2 Forward and Backward Value Iteration with Sequential Advisory Search (FBVISAS) Require: C,T,M,ε,d {M-ε-d-CSAs problem instance specification} CM ←∅ ¯% K−1 ¯% K−1 {Jk (Ck)}k=0 , {Ck+1(Ck)}k=0 ← ReverseVI(C,T) % K % K {J k(Ck)}k=1, {Ck−1(Ck)}k=1 ← ForwardVI(C,T) for k =1,...,K do for Ck ∈Ck do % % ¯% J (Ck) ← J k(Ck)+Jk (Ck) % Insert Ck into priority queue J with key J (Ck) Ck ← minimum-key configuration in J{Use to seed construction of minimum-cost advisory} % % J ← last extracted key J (Ck) % ¯% 1 Use Ck−1(Ck)andCk+1(Ck) iteratively to define minimum-cost advisory C Add C1 to CM repeat Ck ← minimum-key configuration in J % % J ← last extracted key J (Ck) % ¯% ˜ Use Ck−1(Ck)andCk+1(Ck) iteratively to define advisory C if Φ(Cm, C˜) ≥ d for all Cm ∈CM then Add C˜ to CM until |CM | = M or |J | =0 return CM

complexity of up to O(log(nK + 1)) each time for a heap implementation). Fi- nally, all nK + 1 nodes may need to be removed from the priority queue, which introduces a computational complexity of up to O(log(nK +1))foreachnode, again assuming a heap implementation. The total computational complexity is thus O(2n2K +2(nK + 1) log(nK +1))=O(n2K + nK log(nK + 1)). If it is not possible to find M advisories that satisfy the constraints, the algorithm returns as many as it can find. FBVISAS can be expected to perform relatively well on problem instances in which the βR-weighted reconfiguration cost is relatively important compared to the static cost. This could occur when βR is relatively large. We will make this statement more precise by defining two classes of problem instances. These classes are merely illustrative; neither of them are likely to be encountered when actually selecting area CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 52

configuration advisories to present to supervisors. In one class, βR is very large and we show that FBVISAS will find an optimal second advisory when one exists. In the other class, βR = 0 and we show that FBVISAS will never return a second advisory when d>1. Appendix E contains proofs of these propositions.

2.8.3 Sequential Distinct A∗ Algorithms

The Sequential Distinct A∗ (SDA∗) algorithm is a novel heuristic that extends the well-known A∗ algorithm for computing a lowest-cost path proposed by Hart et al. [70]. SDA∗ runs an A∗-like algorithm M times, using a different priority queue ranking function each time. The Sequential Distinct A∗ with Shortcuts (SDA∗-SC) algorithm extends SDA∗ by using information from the initial, time-reversed execution of the A∗- like algorithm to find low-cost and distinct advisories more quickly. Before describing the SDA∗ and SDA∗-SC algorithms, we will specify the A∗ and A∗-like algorithms that are utilized by these algorithms. In particular, to better motivate SDA∗ and SDA∗-SC, we will first prove some properties of Forward Distinct A∗ (FDA∗), a related algorithm.

Forward Distinct A∗ Algorithm.

The Forward Distinct A∗ (FDA∗) algorithm is specified in Appendix F. FDA∗ is not used by SDA∗ or SDA∗-SC, but it is closely related to the Forward Distinct A∗ with Shortcuts (FDA∗-SC) algorithm that they do utilize. FDA∗ is related to the Lagrange dual problem for certain problem instances, so we will specify and discuss FDA∗ to help explain and motivate FDA∗-SC. As specified, the FDA∗ algorithm only finds a second advisory. It does so by finding an advisory that minimizes

g(C2,T)+λ(d − Φ(C1,C2)) (2.19)

for some λ ∈ R+. This is the Lagrangian of the problem faced when finding the second advisory for some simple M-ε-d-CSAs instances. This property can be used to show that, under the right circumstances, the second advisoryreturnedbyFDA∗ satisfies a necessary condition that must be met by any optimal second advisory. CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 53

Some algorithms for constrained shortest path algorithms similarly leverage Lagrange duality [42,69]. Appendix G provides proofs and further discussion of these properties of FDA∗.

Forward Distinct A∗ with Shortcuts Algorithm.

The FDA∗-SC algorithm, which is specified in Appendix H, extends the FDA∗ al- gorithm in four main ways. The first is that it explicitly describes how to handle problem instances for which M>2. Second, it confirms that the partial advisory ending at a configuration for a given time step has a chance of meetingthecostand difference constraints before adding the configuration to the queue of configurations and corresponding partial advisories under consideration. Third, it normalizes the terms in the cost function to increase the likelihood that a single value for λ will per- form well across problem instances. In particular, while FDA∗ finds a second advisory that minimizes the objective (2.19), FDA∗-SC attempts to minimize

m ! g(Cm,T) 1 −1 d − Φ(Cm ,Cm) + λ J % m − 1 Φ − d +1 m! max !=1 th when searching for the m advisory. Here Φmax is the maximum possible advisory difference metric value. Showing that FDA∗-SC attempts to minimize this quantity requires some algebraic manipulations similar to those used in the proofofLemma3 % in Appendix G. Normalizing by J ,(m−1), and (Φmax −d+1) is designed to increase the likelihood that the magnitude of the first term and the magnitude of the term multiplied by λ will not vary much across problem instances and as we search for subsequent advisories. The hope is that this will ensure that a single value of λ may be sufficient in all of these contexts and we can avoid re-tuning λ for each problem instance or for searches for each subsequent advisory. Finally, the fourth change is that the FDA∗-SC algorithm can take shortcuts in an attempt to reduce computation times. The FDA∗-SC algorithm makes use of information that could be provided by an initial ReverseA∗ run that could also find the first returned advisory. In particular, this initial run would yield partial advisory CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 54

and corresponding cost information from certain configurations at certain time steps ¯ K−1 ¯† K−1 to the final time step. This information is in {Jk(Ck)}k=0 and {Ck+1(Ck)}k=0 , which

are not necessarily defined for all Ck. When investigating a partial advisory ending at ∗ some Ck,FDA -SC checks to see if, by combining this partial advisory from the initial

time step with the partial advisory that continues on from Ck and was explored by the initial ReverseA∗ run, we arrive at a complete shortcut advisory that (1) achieves an acceptable cost value (as determined by the algorithm parameter ε$ ∈ [0,ε]) and (2) also achieves the difference constraint. If this is the case, then the algorithm returns a shortcut advisory the next time it queries the priority queue.

Sequential Distinct A∗ Algorithm with Shortcuts.

The SDA∗-SC algorithm is specified in Algorithm 3. This algorithm involves us- ing ReverseA∗ to find the first advisory and then FDA∗-SC to find any subsequent advisories. Since SDA∗-SC is essentially M iterations of variations of A∗, its compu- tational complexity is O(Mn2K log(nK +1)). If ε$ = 0, then the algorithm never takes a shortcut and we refer to it as just SDA∗.

Algorithm 3 Sequential Distinct A∗ with Shortcuts (SDA∗-SC) Require: C,T,M,ε,d {M-ε-d-CSAs problem instance specification} Require: λ,ε$ {Algorithm parameters} 1 ¯ K−1 ¯† K−1 ∗ C , {Jk(Ck)}k=0 , {Ck+1(Ck)}k=0 ← ReverseA (C,T) Add C1 to CM J % ← g(C1,T) for m =2,...,M do M ∗ % M ¯ K−1 ¯† K−1 $ C ← FDA -SC(C,T,J ,ε,d,C ,λ,{Jk(Ck)}k=0 , {Ck+1(Ck)}k=0 ,ε) if |CM |

2.8.4 Lowest-Cost Paths

If we relax the M-ε-d-CSAs problem by disregarding constraints (2.15) and (2.16), we are left with an M lowest-cost paths problem. Many efficient algorithms that solve this CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 55

problem have been developed [20, 52, 53, 72, 78, 94]. For example, the computational complexity of the algorithm proposed by Eppstein is O(n2K + nK log(nK +1)+ M), which is essentially identical to the complexity of the FBVISAS heuristic even though the former algorithm is optimal (albeit for the lowest-cost paths relaxation of the problem of interest). Therefore, we investigated the solution to the relaxed M lowest-cost paths problem version of some M-ε-d-CSAs problem instances to see if the returned advisories happened to meet the constraints (2.15) and (2.16). If this occurred frequently enough then one of these existing algorithms for the lowest- cost paths problem could be used for the M-ε-d-CSAs problem. We implemented an algorithm for the M lowest-cost paths problem and refer to it as the LCP algorithm.

2.8.5 Computational Complexity Comparison

The computational complexities of various algorithms proposed and discussed in this chapter are documented in Table 2.3. The complexity is reported when n is the largest number of valid configurations in configuration time steps k =1, 2,...,K, which means that the graph under consideration has at most nK +1 nodesandn2K + n edges when we also account for the initial configuration C0. The complexity of value iteration for the graph under consideration is lower than for a general graph, and this is reflected in the complexity results for VIFOEAS and FBVISAS. The complexities of the other algorithms do not explicitly consider the special structure of the graph under consideration (i.e., they are based only on the number of nodes and edges in the graph, not on the time-expanded structure). This, along with the worst-case assumption that no shortcuts are found, may explain why the complexity of SDA∗- SC is larger than that of FBVISAS, even though we show in Section 2.9.2 that over thousands of problem instances SDA∗-SC always executed in less time than FBVISAS. As expected, the complexity of VIFOEAS is larger than that of the other algorithms, |Cε| and it does not even include the cost of searching through up to M advisory subsets. For reference, we also provide the complexities of two representa/ tive0 algorithms for finding M paths [53, 118]. Each is optimal for a particular problem that involves finding M paths. The lowest-cost paths algorithm solves a relaxation of M-ε-d-CSAs Table 2.3: Computational Complexity of Algorithms Algorithm Complexity Reference VIFOEAS O(n3K)a Section 2.8.1 FBVISAS O(n2K + nK log(nK + 1)) Section 2.8.2 SDA∗-SC O(Mn2K log(nK + 1)) Section 2.8.3 Eppstein lowest-cost paths O(n2K + nK log(nK +1)+M) [53] Suurballe lowest-cost node-disjoint paths O(Mn2K log(nK +1))b [118]

a |Cε| This does not include the complexity of searching through M advisory subsets. b This assumes that Dijkstra’s algorithm is used as a subroutine for finding shortest paths. / 0 CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 57

(see Section 2.8.4). The lowest-cost node-disjoint paths algorithm solves a problem similar to the M-ε-d-CSAs problem, but with a different φ function, a particular value of d, and in which constraint (2.14) is not enforced (i.e., the first advisory is not required to be optimal for a single-path problem). The complexity of the lowest-cost paths algorithm is essentially identical to that of FBVISAS (particularly for M =2 or 3), and the complexity of the lowest-cost node-disjoint paths algorithm is identical to that of SDA∗-SC. The complexities of the FBVISAS and SDA∗-SC heuristics are therefore comparable to those of algorithms that are optimal for other problems that are related to the M-ε-d-CSAs problem.

2.9 Fast-Time Simulations of Algorithms for M-ε- d-CSAs

The performance of the algorithms was first analyzed by using them to solve some small problem instances for which optimal solutions could be computedbythebench- mark VIFOEAS algorithm in a reasonable amount of time; this analysis is documented in Section 2.9.1. Then, as described in Section 2.9.2, we analyzed the performance of the two novel heuristics by using them to solve thousands of realistic problem instances. Before presenting the results of the performance analysis, we will provide some notes about our implementation of the algorithms. The algorithms were coded in Java and executed on a MacPro workstation with a Quad-Core IntelXeon2.8GHz processor and 4 GB of memory. Before a problem instance is specified or an al- gorithm is executed, the set of all possible configurations that could be used in a time step is pre-computed and stored for use by all instances. Specifying a problem instance involves tasks like fetching traffic data, parsing parameter files, and trans- lating constraints into a set of configurations that could be used at some time step by an advisory. These instance-specification tasks are performed using the same code regardless of which algorithm is deployed, so the time required for these tasks does not vary between algorithms. When an algorithm first calls the single-time step cost CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 58

function gk(Ck−1,Tk−1,Ck,Tk) for a set of inputs, the cost value is computed and then cached. If the algorithm later requires the single-time step cost forthesamesetof inputs, the cached cost value is returned, which reduces the computational burden of the algorithms.

2.9.1 Investigation of the Performance of Algorithms on Small Problem Instances

In this section, we investigate the LCP algorithm and the FBVISAS, SDA∗,and SDA∗-SC heuristics by comparing their output to that of the VIFOEAS algorithm— which is guaranteed to return an optimal solution—when they are asked to solve some small problem instances. The small problem instances utilized here are based on ZOB area 4 (see Section 2.2 and Fig. 2.1(a)). The instances come from nine two- hour time periods (6 am–8 am, 8 am–10 am, ..., 10 pm–midnight local time) on two dates (Thursday 01 December 2011 and Tuesday 06 December 2011), so there are 18 instances. The configuration time step was five minutes (∆= 5) and the traffic time step was one minute (δ = 1). The algorithm was restricted to select from 16 airspace configurations that ZOB staffhave identified as feasible. Default workstation assign- ments were used for all open sectors. There were 173 area configurations (i.e., cor- responding airspace, operating position, and workstation configurations) that could be generated from these 16 airspace configurations. However, the advisories were required to use the same number of open sectors as were used in each configuration time step during actual operations on those days. Therefore, there were only between 4 and 80 valid configurations available at each time step instead of 173, which reduced the number of advisories in C considerably. In order to further reduce the size of the space of feasible solutions, only two advisories were requested (M =2)andthesecond advisory was required to achieve a cost value relatively close to the minimum cost (ε =0.2). Based on parameter-tuning efforts documented in Section 2.5.2,thecost function parameter βR was set to 2. Finally, based on discussions with subject-matter experts, d was set equal to 6. This implies that different airspace configurationsmust be used for at least 30 of the 120 minutes in the two returned advisories. The SDA∗ CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 59

parameters for these problem instances were set to λ =0.11875 and ε$ =0.The SDA∗-SC parameters were set to λ =0.11875 and ε$ =0.2. All of the algorithms found the same minimum-cost first advisory for each of the 18 instances. The VIFOEAS algorithm found two advisories for 12ofthe18 instances. The LCP algorithm found two feasible advisories for only 1 of the 18 instances, indicating that relaxing our problem so that we can solve an M lowest-cost paths problem typically does not lead to M feasible advisories for the M-ε-d-CSAs problem, even when M such advisories exist. The FBVISAS, SDA∗,andSDA∗-SC heuristics each found two advisories for 10, 12, and 12 instances, respectively. These results suggest that the novel heuristics typically find a feasible second advisory when one exists. Since the first advisories returned by the algorithms all achieve the same cost, the suboptimality of the advisories provided by each algorithm was investigated by focusing on the costs of the second advisories. For problem instances with a feasible second advisory, Fig. 2.9 shows the excess cost fraction achieved by the second advi- sory returned by each of the algorithms. An empty marker located at fraction 0.30 indicates that an algorithm failed to return a second advisory for a problem instance. By comparing the second advisory returned by each algorithm to the second advisory returned by the VIFOEAS algorithm, which is guaranteed to achieve the lowest cost of all the second advisories that satisfy the M-ε-d-CSAs constraints, we see that the FBVISAS, SDA∗,andSDA∗-SC algorithms each found a lowest-cost second advisory in 9, 9, and 5 instances, respectively. These results suggest that the FBVISAS and SDA∗ heuristics are better at finding a lowest-cost second advisory thantheSDA∗-SC heuristic. This result is to be expected because of the shortcuts that the SDA∗-SC heuristic takes in an attempt to reduce computation times. When generating these 18 instances 5 times each (once per algorithm), the mean and standard deviation of the time required to load the traffic data were 856 ms and 104 ms, respectively. The mean and standard deviation of the time required for other instance-specification tasks were 35 ms and 4 ms, respectively. However, these problem instances were too small to allow for a meaningful analysis of the computation time required by the algorithms, with one exception. These instancesrevealthatthe CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 60

0.30 LCP 0.25 VIFOEAS ε =0.2 FBVISAS 0.20 SDA∗ SDA∗-SC 0.15

0.10 Excess Cost Fraction 0.05

0 1 2 3 4 5 9 10 11 12 13 14 17 Instance Number

Figure 2.9: Excess cost fraction achieved by second advisories. optimal VIFOEAS algorithm is far more computationally intensive than the other algorithms. It required hours to find a solution for some of these problem instances, while the other algorithms required less than a second. Given that we desire to solve larger problem instances than are studied in this section in less than a minute on a tablet computer, these results indicate that the computational performance of VIFOEAS is unacceptable. Even if a more efficient stack implementationwereused (see Section 2.8.1), it is unlikely that VIFOEAS would achieve acceptable computation times.

2.9.2 Investigation of the Performance of Algorithms Using aYearofData

To better understand their behavior on realistic problem instances,theFBVISAS and SDA∗-SC heuristics were used to solve many more problem instances. The in- stances involve ZOB area 4 for 231 non-weekend and non-holiday days selected from 20 October 2011 to 19 October 2012. Weekends and holidays were excluded because they might involve low-volume or atypical traffic patterns, leading to configuration CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 61

Table 2.4: Fraction of Problem Instances with One, Two, or Three Advisories Re- turned Algorithm One Advisory Two Advisories Three Advisories SDA∗-SC 0.21 0.20 0.59 FBVISAS 0.15 0.20 0.66

selections that are correspondingly atypical. On each day, 18 problem instances were solved for the time periods from 6:00 am–8:00 am, 7:00 am–9:00 am, . . . , 11:00 pm– 1:00 am local time. These overlapping problem instances were selected to approximate how area configuration advisories might be used in practice. A total of 4158 instances were solved by each of the two heuristics. For these instances, the following parameter values were selected: βR =2,M =3, d =6,andε =0.5. The configuration time step was five minutes (∆= 5) and the traffic time step was one minute (δ =1).TheSDA∗-SC parameters were set to λ =0.11875 and ε$ =0.25. The same set of 173 configurations were permitted in these instances as the ones described in Section 2.9.1, but in this case the only A additional constraint was that the initial airspace configuration C0 be identical to the airspace configuration that historical records indicate was in use at the start of the problem instance. Table 2.4 shows the fraction of problem instances for which each heuristic returned one, two, or three advisories. The FBVISAS heuristic is more likely to return more advisories. It returned three advisories in 7% more instances and one advisory in 6% fewer instances than the SDA∗-SC heuristic. Both algorithms found exactly two advisories in 508 of the 4158 problem instances and both found three advisories in 2380 of these instances. The relative value of the costs of the returned advisories were investigated for these two classes of instances. When M =3,theM-ε-d-CSAs problem objective (2.11) is to minimize g(C1,T)+ g(C2,T)+g(C3,T), but constraint (2.14) requires that g(C1,T)bethesamefor any feasible solution. Therefore, more insight into the relative quality of solutions is gained by investigating only the costs of the second and third returned advisories. For instances where both heuristics returned three advisories, we studied the g(C2,T)+ CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 62

g(C3,T) ratio: the sum of the costs of the second and third advisories returned by SDA∗-SC divided by the sum of the costs of the second and third advisoriesreturned by FBVISAS. For instances where both heuristics returned exactly two advisories, we studied the g(C2,T) ratio: the cost of the second advisory returned by SDA∗-SC divided by the cost of the second advisory returned by FBVISAS. Figures 2.10(a) and 2.10(b) show the distributions of these ratios. The ratio is between 0.99 and 1.01 in more than 40% of the instances where both heuristics returned three advisories and in more than 80% of the instances where both heuristics returned exactly two advisories. On average, the ratio is only slightly above 1 in each case. While there are a few instances where SDA∗-SC returned advisories that are 20% or more costlier than the advisories found by FBVISAS, these instances were relatively rare and the ratio never exceeded 1.3. SDA∗-SC did sometimes achieve lower costs for these second and third advisories, but it was more commonforFBVISAS to find lower-cost advisories. Overall, FBVISAS tended to find slightly lower-cost advisories than SDA∗-SC. When generating these 4158 instances 2 times each (once per heuristic), the mean and standard deviation of time required to load the traffic data were 824 ms and 95 ms, respectively. The mean and standard deviation of the time required for other instance- specification tasks were 38 ms and 3 ms, respectively. Figure 2.11 is a box plot showing the spread in the computation times of the two heuristics. SDA∗-SC always had lower computation times than FBVISAS. The average SDA∗-SC computation time (1283 ms) was just over half of the average FBVISAS computation time (2332 ms). CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 63

1

0.8 Mean = 1.0161

0.6

0.4

0.2 Fraction of Instances

0 0.7 0.8 0.9 1.1 1.2 1.3 0.99 1.01 Ratio (SDA∗-SC/FBVISAS)

(a) Sum of costs of second and third advisories (g(C2,T)+g(C3,T)).

1 Mean = 1.0051 0.8

0.6

0.4

0.2 Fraction of Instances

0 0.7 0.8 0.9 1.1 1.2 1.3 0.99 1.01 Ratio (SDA∗-SC/FBVISAS)

(b) Cost of second advisory (g(C2,T)).

Figure 2.10: Distribution of the ratio of costs. CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 64

3000

2500

2000

1500

1000

Computation Time [ms] 500

0 SDA∗-SC FBVISAS

Figure 2.11: Distributions of computation time required by the SDA∗-SC and FBVISAS heuristics. 2.10 Human-in-the-Loop Experiment Results

Lee et al. conducted a part-task human-in-the-loop experiment in which the SDA∗-SC algorithm described in Section 2.8.3 was utilized by a decision-support tool referred to as the Operational Airspace Sectorization Integrated System (OASIS) [87]. Eight retired Federal Aviation Administration personnel each used the tool in four simulated scenarios. Figure 2.10 shows a screenshot of the OASIS tool running on an Android touch tablet, in which the user is presented with three advisories. Results of this experiment suggest that presenting multiple near-optimal and dis- tinct advisories added value because in more than 60% of the cases in which the algorithm presented users with more than one advisory, users found the second or third advisory to be more acceptable than the first advisory, even though the first advisory was optimal according to the objective function we used to evaluate the safety and efficiency of operations (see Section 2.3.1). Furthermore, when asked how many advisories they wanted the tool to provide, participants that worked with this CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 65

Figure 2.12: Screenshot of the OASIS decision-support tool. algorithm requested an average of 2.8 advisories. Results of this experiment also suggest that the algorithm produces highly accept- able advisories. The average acceptability of selected algorithm-generated advisories was more than four on a five-point scale. Furthermore, when given an opportunity to modify selected advisories, user-generated modifications were minor and led to no significant improvement in acceptability. Finally, the algorithm executed quickly enough. The goal was to compute a new set of advisories in less than one minute because that was the update rate of the input data, and this goal was met. Users rated the algorithm computation time as highly acceptable. CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 66

2.11 Conclusions

Air traffic controller supervisors configure available sector, operating position, and workstation resources to safely and efficiently control air traffic. This chapter de- scribes a prescriptive decision model for this task. The model objective is to minimize a cost function that is a weighted sum of a static cost and a reconfiguration cost. De- creased safety and efficiency associated with a mismatch between the predicted traffic and the configuration is penalized by the static cost; decreased safety and efficiency associated with the effort involved in changing configurations is penalized by the re- configuration cost. Decision model constraints capture bounds on available resources and other operational considerations. The decision model specifies a lowest-cost path problem on a time-expanded graph. Default values for many objective parameters were set based on observations of, discussions with, and surveys completed by subject- matter experts. To find an appropriate value for the objective function parameter that determines the importance of static cost relative to reconfiguration cost, we compared historical configurations to corresponding algorithm advisories. We also conducted a study to determine the implications of varying this parameter. One investigation in this study revealed that the percent of open sector–minutes spent below, in, and above zero-static cost open sector load levels is insensitive to changes in the param- eter. Furthermore, the open sectors in advisories spend considerably less time below the zero-cost load levels and considerably more time in and above these levels than the open sectors used in historical airspace configurations. A second investigation showed that changes in the value of this parameter can lead to corresponding changes in the distribution of open sector instance durations. The second investigation also revealed that the distribution of open sector instance durations in historical airspace configurations is similar to the distribution in the advisories, except that advisory open sector instance durations are more frequently between 5 and 30 minutes than historical open sector durations. It is difficult to model the relationship between area configurations and safe and CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 67

efficient traffic operations, and our decision model is incomplete, so weproposedpre- senting area supervisors with a set of near-optimal and meaningfully-different con- figuration schedule advisories. To this end, we studied the Mε-optimal d-distinct Configuration Schedule Advisories problem. This problem is equivalent to finding an optimal path as well as other near-optimal and distinct paths in a time-expanded graph. It is related to the well-known lowest-cost paths problem butweshowedthat the constraint requiring distinct paths makes it NP-complete. We proposed three algorithms for this problem and also investigated afourth that solves a relaxation of this problem, namely, the M lowest-cost paths problem. The Value Iteration Fraction Optimal with Exhaustive Advisory Search algorithm trivially extends the dynamic programming-based algorithm proposedbyByersand Waterman [40]. Although it is computationally expensive, it finds an optimal solution: the lowest-total-cost set of advisories that meet the problem constraints. We proposed the novel Forward and Backward Value Iteration with Sequential Advisory Search heuristic, which is based on value iteration. Primarily by changing the value of a cost function parameter, we defined a class of problem instances for which the heuristic finds an optimal solution and another class of instances for which it will fail to return a second advisory even when a feasible second advisory exists. We also proposed the novel Sequential Distinct A∗ with Shortcuts heuristic, which leverages and extends the A∗ algorithm. It can be motivated by studying the Lagrangian of certain simple problem instances and it is designed to achieve relatively low computation times. The computational complexities of these two novel heuristics are comparable to those of representative optimal algorithms for related problems that involve finding a set of paths. We first evaluated the algorithms on small problem instances by comparing their solutions to the solution returned by the benchmark optimal algorithm. These prob- lem instances revealed the inadequacy of the lowest-cost paths algorithm (it rarely returned feasible second advisories) and the benchmark optimal algorithm (it required excessive computation times). The two heuristics, on the other hand, typically re- turned feasible solutions when a feasible solution existed and found optimal solutions for half of the instances. The value iteration- and A∗-based heuristics were also used CHAPTER 2. DECISION-SUPPORT TOOL FOR AREA SUPERVISORS 68

to solve thousands of realistic problem instances based on Cleveland Air Route Traffic Control Center Area of Specialization 4. For these instances, the average computation time for the A∗-based heuristic (1283 ms) was just over half of the average compu- tation time of the value iteration-based heuristic (2332 ms). However, the value iteration-based heuristic found slightly lower-cost advisories (less than 2% lower on average). The value iteration-based heuristic also returned the problem-requested number of advisories (three) for 66% of the instances, while the A∗-based heuristic only returned three advisories for 59% of the instances. These results indicate that, when compared with the A∗-based heuristic, the value iteration-based heuristic offers higher-quality solutions at the expense of longer computation times. Finally, the A∗-based heuristic was incorporated into a decision-support tool for area supervisors. Results of a human-in-the-loop experiment involving this tool were encouraging. User feedback indicates that presenting multiple advisories added value, that selected algorithm-generated advisories were highly acceptable, and that the algorithm executed quickly enough. These results suggest that we have developed a decision model and solution algorithm that enabled a helpful decision-support tool. Chapter 3

Airline Delay Cost Model Evaluation

3.1 Introduction

Flight delays impose financial costs on airlines. These costs can be most accurately estimated by airlines since they alone are aware of most of the relevant factors, but airlines are reluctant to reveal their costs because doing so could beadvantageous to their competitors. Air traffic management researchers are more likely to develop decision-support tools that help airlines and other stakeholders if they accurately understand airline costs and how airlines make decisions to reduce them. This is true even for researchers designing collaborative mechanisms designed to elicit cost- related information from airlines and to allocate capacity accordingly. Since they are unable to compute delay costs precisely, researchers typically assume that airlines make decisions to minimize a delay cost model that can be computed with publicly- available data. Hopefully, this model is strongly correlated with the financial costs that airlines are actually trying to reduce and therefore leads to accurate descriptive airline decision models. Only one effort has been made to tune and validate these delay cost models with records of airline decisions. In her dissertation, Xiong used airline flight cancellation and slot usage data from Ground Delay Programs to tune parameters in discrete

69 CHAPTER 3. AIRLINE DELAY COST MODEL EVALUATION 70

choice models of airline decision making [131]. Discrete choice models assign a proba- bility to every possible choice an airline can make; the probability of a choice depends on the cost of the choice and the costs of the other possible choices. Xiong’s research revealed many characteristics of airline delay costs, but it also has some limitations. Most importantly, Xiong did not study any “separable” delay cost models. Separable delay cost models use a flight delay cost model to compute the cost of delaying each flight and assume that the total airline cost is the sum of the individual flight delay costs. While there are exceptions, most air traffic management research assumes a separable delay cost model. Some airline decision support tools are alsobasedon separable delay cost models [35, 98, 124]. Furthermore, Xiong’s use of Ground Delay Program data limited her ability to investigate the difference in the cost of delay for hub-bound flights and other flights. Finally, while Xiong studied linear models with data-intensive variables related to airline revenues, her work did not consider some simple variables from previous research, such as the minutes of delay multiplied by a weight related to the time-of-day or destination airport [19, 113]. The goal of the research in this chapter is to find, from a set of proposed separable delay cost models, the models and corresponding additive cost noise parameters that maximize the likelihood of historical airline decisions in Airspace Flow Programs. To this end, a heuristic is developed that finds cost noise parameters that maximize an approximation of the log-likelihood of the airline decision data. The remainder of this chapter is structured as follows. Section 3.2 provides back- ground information about Airspace Flow Programs. We define the airline decision model, maximum likelihood estimation problem, and two heuristics for this problem in Section 3.3. In Section 3.4, we propose and solve estimation problem instances with known noise parameters to investigate the validity of the heuristics. We eval- uate proposed airline delay cost models in Section 3.5 and provide conclusions in Section 3.6. The research described in this chapter was initially presented in [32] and [22]. The author of this dissertation made major contributions to the research described in and the writing of these papers, and is the first author on both. CHAPTER 3. AIRLINE DELAY COST MODEL EVALUATION 71

3.2 Background

An Airspace Flow Program (AFP) is a mechanism used by the Federal Aviation Administration (FAA) to assign departure delays to aircraft in ordertoreducedemand for a region of airspace known as a Flow Constrained Area (FCA). AFPs are based on slots. A slot is the right to fly into an FCA during a specified period of time. The FAA assigns departure times to flights so that each flight arrives at the FCA approximately at the time of the slot to which it is assigned. Slots are allocated to airlines with an algorithm that is based on a first-scheduled-first-served (FSFS) principle. By default, each airline’s flights are assigned to their allocated slots in an FSFS manner, but the airline can adjust this assignment. Some airlines alter assignments of flights to slots in AFPs thousands of times each year. In this chapter, records of these airline assignment decisions are used to evaluate airline delay cost models.

3.3 Method

3.3.1 Airline Decision Model

Airlines have tools and procedures that allow them to make acceptable decisions during an AFP, but these decisions are complicated and difficult for researchers to model [76]. For example, the impact of delaying a flight is difficult to compute because passenger, luggage, crew, and aircraft connections mean that delaying one flight may impact several other flights. To make this problem tractable, a separable delay cost model will be utilized. Specifically, it is assumed that airlines attempt to minimize the sum of the delay costs associated with assigning each flight to each slot. If airlines minimize a separable delay cost and do not consider possibilities like canceling flights or adjusting the assignment later, then the decision faced by airlines when assigning flights to slots is well-known and referred to as the minimum cost perfect matching problem. Given a set of flights and slots, a matching is a set of connections between flights and slots such that flights are matched to only one slot and vice versa. A perfect matching is any matching in which no flight or slot is left unmatched. Several algorithms can solve this problem efficiently, even for cases where CHAPTER 3. AIRLINE DELAY COST MODEL EVALUATION 72

there are hundreds or thousands of flights and slots. This is not necessarily the case for the more sophisticated decision models. Similar decision models are used in some airline slot assignment decision-support tools [35, 67, 98] and other previous research on assignments of flights to slots by airlines [16, 92, 125, 126]. More sophisticated models and solution algorithms have been developed that consider the possibility of canceling flights, non-separable cost models, and other issues [124]. These models may be more accurate, but this minimum cost perfect matching decision model allows for tractable evaluations of delay cost models with AFP data. Let F be the set of flights belonging to an airline in a matching, and let S be a set of all of the airline’s slots in the matching. The number of flights and slots is

n. Associated with each flight fi ∈ F are various characteristics such as the airline operating the flight (afi ), the scheduled time of arrival at the constrained resource

(tfi ), an estimate of the number of passengers on the flight (pfi ), and the aircraft

type used for the flight (efi ). These characteristics are leveraged by various delay

cost models. Associated with each individual slot sj ∈ S is a time tsj and a time

window [tsj ,tsj + δ]forsomeδ ≥ 0. A flight fi can only be assigned to a slot sj if it can arrive at the FCA before the end of the time window corresponding to the slot

(that is, if tfi ≤ tsj + δ). There are historical assignments of flights to slots where

the scheduled time of arrival is after the slot time (tfi >tsj ), so delay is computed as

d(fi,sj)=max{0,tsj − tfi }. The set of historical matchings of flights and slots by an airline is denoted by (F, S, M). An element (F, S, M) ∈ (F, S, M) contains the set of flights F and set of slots S associated with the perfect matching M selected by an airline. Matrix M is a square n × n binary matrix with an entry for each possible assignment of a flight to a slot. Element Mij is 1 if fi is assigned to slot sj and is 0 otherwise. For a cost model that computes a cost of c(fi,d) associated with delaying flight fi by d minutes, n n the cost of a matching is i=1 j=1 c(fi,d(fi,sj))Mij. If a non-separable cost model were used, this equation. could. not be expressed as a sum of individual flight delay costs. Even if airlines do minimize a separable delay cost when matching flights and slots, it is unlikely that their delay cost model can be computed exactly from publicly CHAPTER 3. AIRLINE DELAY COST MODEL EVALUATION 73

available data [76]. One way to handle this issue is to add a noise term to the delay cost model to account for unobserved factors that impact the cost. Discrete choice models can be derived by assuming particular distributions for such noise terms [131]. We assume that the actual cost of delaying fi by assigning it to sj is c(fi,d(fi,sj))+εij. The deterministic part of the cost that can be computed using only observed publicly-available data is c(fi,d(fi,sj)) and the stochastic part that accounts for unobserved factors known only to the airline is εij.Weassumethatεij is identically and independently distributed (iid) for all i, j ∈{1, 2,...,n}. This assumption is unlikely to be true in some cases because, for example, if delays for a particular fi! are costly in a way that is not accounted for by the deterministic partofthecost model, the corresponding additive cost noise random variables εi!j ∀j =1, 2,...,n may not be independent and may all have a positive mean that other εij variables do not have. Although unlikely in some cases, this assumption is made because it enables the development of the heuristic described in Section 3.3.3. Additionally, we assume that the distribution of the εij variables is Gaussian with mean µ and variance σ2. Under these assumptions, the minimum cost perfect matching problem faced by the airlines is n n minimize (c(fi,d(fi,sj)) + εij)Xij i=1 j=1 .n . subject to Xij =1 ∀i ∈{1, 2,...,n} j=1 (3.1) .n Xij =1 ∀j ∈{1, 2,...,n} i=1 X.ij ∈{0, 1}∀i, j ∈{1, 2,...,n}, where variable Xij is 1 when fi is assigned to sj and 0 otherwise. Since εij models factors impacting the cost that are known by the airline but unobserved by the pub- lic, airlines are assumed to solve a deterministic optimization problem in which the realized εij values for all i, j ∈{1, 2,...,n} are revealed before the optimization. CHAPTER 3. AIRLINE DELAY COST MODEL EVALUATION 74

3.3.2 Estimation Problem Statement

The problem under consideration in this chapter is to find the delay cost models and corresponding noise parameters that maximize the likelihood of airline decisions in

AFPs. More precisely, for each airline and for each candidate cost model ck,weseekto

find the noise parameters θ that maximize the likelihood Lk(θ)=Prob(M|θ, F, S). If each (F, S, M) ∈ (F, S, M) is independent, the log-likelihood is

'k(θ)= log Prob(M|θ,F,S). (3.2) (F,S,M!)∈(F,S,M) The delay cost model that maximizes the likelihood is the one for which θ values are

found that achieve the largest value for 'k(θ) (and therefore also the largest Lk(θ)).

3.3.3 Linear Program Cost Approximate Maximum Likeli- hood Estimation

The Linear Program Cost Approximate Maximum Likelihood Estimation (LPCAMLE) heuristic attempts to find the maximum likelihood estimates of parameters of a Gaus- sian noise term that is added to linear program (LP) cost vectors. Itusesadataset consisting of LP instance coefficient values and corresponding LP solutions, but not any samples from the noise distribution. It is based on an approximation of the like- lihood motivated by LP sensitivity analysis referred to as the LP Sensitivity Analysis (LPSA) approximation. This section utilizes slightly different notation than other sections of this chapter in order to more closely match notation traditionally used when describing LPs and random variables. Suppose an entity solves a set of N deterministic LPs of the form

$ & minimize (ci + +i) xi

subject to Aixi = bi (3.3)

xi / 0.

$ The vector +i is a sample from a random vector Ei, and the sample is provided before CHAPTER 3. AIRLINE DELAY COST MODEL EVALUATION 75

the LP is solved. Each element of each random vector Ei (for i =1,...,N) is an independent random variable with the same Gaussian distribution thathasmeanµ and variance σ2.Letθ =(µ, σ2) denote the parameters of this Gaussian distribution.

The probability density function (pdf) of Ei is fEi (·; θ). The LP instance coefficient % N % and solution data available when estimating θ are {xi ,Ai,bi,ci}i=1,wherexi is the % solution of problem instance i selected by the entity under observation. Each xi is

% " a sample from a random variable Xi with pdf fXi . This pdf is parameterized by θ and dependent upon LP instance coefficient and solution data, so it is written as

" % fXi (·; θ,Ai,bi,ci). The randomness in Xi results from the randomness in Ei,and % different samples +i from Ei can lead to the same xi . % $ $ Let J (Ai,bi,ci,+i) be the optimal value of problem (3.3) for a noise sample +i and ! %,(i for problem parameters Ai, bi,andci. Similarly, let xi be an optimal solution to $ LP instance i with noise sample +i. LP sensitivity analysis can be used to approximate the optimal solutionofprob- $ lem (3.3) as +i changes [34]. The approximation of the solution is

% $ % j $ & %,0 J (Ai,bi,ci,+i) ≈ J (Ai,bi,ci , 0) + +i xi . (3.4)

% The likelihood of θ when given a solution xi for a single LP instance is

% & % % " 1 fXi (xi ; θ,Ai,bi,ci)= fEi (+i; θ) (ci + +i) xi = J (Ai,bi,ci,+i) d+i, (3.5) 1 2 3 where 1{a} is an indicator function that returns 1 if a is true and 0 otherwise. Approximation (3.4) can be used to approximate the expression inside the indicator function, leading to an approximation of the likelihood:

% & % % & %,0 " 1 fXi (xi ; θ,Ai,bi,ci) ≈ fEi (+i; θ) (ci + +i) xi = J (Ai,bi,ci, 0) + +i xi d+i. 1 2 3 (3.6) Performing some simplifications leads to an expression involving a weighted sum of CHAPTER 3. AIRLINE DELAY COST MODEL EVALUATION 76

the elements in +i:

% & %,0 % & % %,0 " 1 fXi (xi ; θ,Ai,bi,ci) ≈ fEi (+i; θ) +i (xi − xi )=ci (xi − xi ) d+i. (3.7) 1 2 3 Since each element of +i is an independent sample from a Gaussian distribution with 2 & %,0 % mean µ and variance σ , the weighted sum +i (xi − xi ) of these also has a Gaussian distribution. In particular, this implies that the integral in eq. (3.7) is simply the & %,0 probability density function (pdf) of a Gaussian random variable with mean 1 (xi − % %,0 % 2 2 & % %,0 xi )µ and variance 1xi − xi 12σ evaluated at ci (xi − xi ). Therefore, we arrive at the LPSA approximation of the likelihood:

% & % %,0 & %,0 % %,0 % 2 2 " fXi (xi ; θ,Ai,bi,ci) ≈ f ci (xi − xi ); 1 (xi − xi )µ, 1xi − xi 12σ , (3.8) / 0 where f(α; β,γ) is the pdf of a Gaussian random variable with mean β and variance γ, evaluated at α. LPCAMLE attempts to maximize the log-likelihood '(θ) of all of the N LP in- stance solutions: N % " '(θ)= log fXi (xi ; θ,Ai,bi,ci). (3.9) i=1 ! To create a tractable problem, LPCAMLE uses the LPSA approximation (3.8) to instead maximize the approximate log-likelihood

N ˆ & % %,0 & %,0 % %,0 % 2 2 '(θ)= log f ci (xi − xi ); 1 (xi − xi )µ, 1xi − xi 12σ . (3.10) i=1 ! / 0 A maximization problem with objective function 'ˆ(θ) can be solved analytically using the first-order necessary conditions and the form of the pdf of a Gaussian random variable. The resulting LPCAMLE estimates θ% =(µ%,σ2%)are

N p v /q µ% = i=1 i i i (3.11) N 2 . i=1 pi /qi . CHAPTER 3. AIRLINE DELAY COST MODEL EVALUATION 77

and N 1 (v − p µ%)2 σ2% = i i , (3.12) N qi i=1 ! & %,0 % %,0 % 2 & % %,0 where pi = 1 (xi −xi ), qi = 1xi −xi 12,andvi = ci (xi −xi ). If pi = qi = 1 for all i and the vi are viewed as the sample data, equations (3.11) and (3.12) simplify to the standard maximum likelihood estimates of the parameters of a normal distribution. If % %,0 xi = xi ,thenqi = 0 and the terms in the sums in eqs. (3.11) and (3.12) are invalid. This situation corresponds to evaluating the pdf of a Gaussian random variable with a variance of zero in the LPSA approximation (3.8). It is not clear how to handle this situation; such instances are skipped for now. Finally, because anon-zeroµ only leads to a bias in a cost vector (or delay cost model), in the remainder of this chapter we do not utilize eq. (3.11) but rather set µ% =0andcomputeσ2,% accordingly via eq. (3.12). To apply LPCAMLE to the estimation problem in Section 3.3.2, problem (3.1) must be posed as an LP. If the appropriate vectors ci, bi, xi,and+i and matrix Ai are constructed, the LP relaxation of the integer program (3.1) is indeed identical to 2 the LP (3.3). Each of the n elements of the vector +i correspond to one εij.Due to the total unimodularity of the appropriate Ai matrix and the integrality of the appropriate bi vector, this LP is guaranteed to produce the same minimum cost value as the corresponding integer program (3.1), even though the LP solution may not be integral [62]. Therefore, the LPCAMLE heuristic can be applied in an attempt to solve the estimation problem posed in Section 3.3.2.

3.3.4 Simulation Approximate Likelihood Estimation

The likelihood can also be approximated using simulations [93]. This can be ac- complished by selecting a problem instance ib randomly from the N matchings in

(F, S, M), generating a corresponding noise sample +b from a distribution with pa- rameters θ, and then solving the corresponding LP problem instance (3.3) to findthe

& %,(b minimum cost value cib xib for the airline matching problem (3.1) with noise sample CHAPTER 3. AIRLINE DELAY COST MODEL EVALUATION 78

+b. If this is done B times, then the simulation approximation of the likelihood is

B & % & %,(b 1 (ci + +b) x − (ci + +b) x M|θ, F, S ≈ 1 b ib b ib ≤ ξ , Prob( ) & %,(b (3.13) B (ci + +b) x b 4 b ib 5 !=1 where ξ is a parameter specifying how close the cost induced by the historical solution % data xib must be to the optimal cost to be considered an optimal solution for this

problem instance with sampled noise +b. K In this case, a set Θ= {θk}k=1 of candidate values for θ must be defined. Then, the maximum likelihood θ% is computed as

% θ ∈ arg max Prob(M|θk, F, S), (3.14) θk∈Θ

where approximation (3.13) is used to approximate Prob(M|θk, F, S).

3.4 Validation

3.4.1 Implementation Notes

The heuristics were implemented and tested in Matlab. LP instances were solved with CVX, a package for specifying and solving convex programs in Matlab [65]. The set Θcontained possible values for σ2 of {1, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100}.The mean was 0 for each θ ∈ Θ. When approximating likelihoods with simulation as in eq. (3.13), B was set to 1000 and ξ was set to 0.01. These values were selected because they performed relatively well on the problem instances under investigation. When comparing outcomes of the LPCAMLE heuristic across different cost models in Sections 3.4.3 and 3.5, the assignment costs must be normalized to have roughly the same magnitude so that fair comparisons of variance estimates and approximate log-likelihoods are possible. To allow for such comparisons, the assignment cost for each cost model was normalized by the average observed assignment cost for the cost model (¯c). CHAPTER 3. AIRLINE DELAY COST MODEL EVALUATION 79

Table 3.1: Comparison of Heuristics on Sample Problem Instances (σ2 =25) Seed LPCAMLE σ2,% Simulation σ2,% 026.81 125.91 224.81 333.61 468.11 2c 2012 IEEE

3.4.2 Comparison of Heuristics

Before using LPCAMLE to evaluate airline delay cost models, some sample estimation problem instances based on generic LPs were specified and solved to demonstrate the behavior of the LPCAMLE and simulation heuristics.

For these sample estimation problem instances, Ai = A, bi = b,andci = c for all of the N = 1000 solved LP problem instances in the data set. The noise parameters were θ =(0, 25). The size of A was 100 × 50. Each element in A was generated by sampling from a Gaussian distribution with mean zero and standard deviation 100. Each element in the cost vector c was sampled from a uniform distribution on [0, 100]. The b vector was computed by post-multiplying A by a vector of ones. For this portion of the validation, the heuristics used the true cost vector c when estimating the variance. Different LP coefficients and corresponding LP solution data sets for different esti- mation problem instances were generated by initializing a random number generator with different seed values. Then, the LPCAMLE and simulation heuristics were ex- ecuted with the generated data sets to estimate the maximum likelihood estimates for θ. Table 3.1 shows the results of the two heuristics. For three of the five estimation problem instances (produced using random number generator seeds 0, 1, and 2), the LPCAMLE variance estimate σ2,% was within 8% of the actual variance σ2 =25.For one of the other cases the estimate was offby almost 35% and in the final case it was offby 172%. It seems that characteristics of the problem coefficients A, b,andc can CHAPTER 3. AIRLINE DELAY COST MODEL EVALUATION 80

impact the quality of the LPCAMLE estimates. The heuristic based on the simulation approximate likelihood, on the other hand, estimated a variance valueof1foreach of the problem instances. This poor performance was not analyzed in detail, but it might be related to small cost noise values failing to perturb the problem cost vector %,( sufficiently to drive the solution to an xi that is not equal to the zero-noise solution %,0 xi . Furthermore, the simulation-based heuristic required almost 1000secondsof computation time, while the LPCAMLE heuristic completed in about 0.5seconds. Due to its poor performance, no further results from the simulation-based heuristic are presented.

3.4.3 LPCAMLE Validation with Synthetic Matching Data

The performance of LPCAMLE was investigated with some synthetic airline matching data generated by solving the airline matching problem (3.1) with known delay cost models and noise sampled from known distributions. This synthetic data was based on the historical matchings of two airlines, referred to as airline E and airline G. There were 1368 matchings for airline E and 473 matchings for airline G. The delay cost models will be explained in Section 3.5. Six of the delay cost models that performed relatively well on the historical matching data were investigated in this validation work. They are cost models 2, 5–8, and 16 (see Table 3.4). The standard deviation of the noise (σ) was varied in this validation work. It was assigned four values ranging fromc ¯ down toc/ ¯ 10. For each of two airline data sets, six delay cost models, and four zero-mean additive cost noise standard deviation levels, a set of synthetic minimum cost perfect matchings was generated for the flights and slots in the airline matchings. Then, for each of these synthetic sets of matchings, the LPCAMLE heuristic was applied with each of the six cost models usedasthe candidate cost model. The additive cost noise variance estimates and approximate log-likelihood values were recorded for each of these candidate cost models. If LP- CAMLE worked perfectly, it would produce exact estimates of the additive cost noise standard deviation and the cost model used to generate the synthetic data would always achieve the largest approximate log-likelihood. CHAPTER 3. AIRLINE DELAY COST MODEL EVALUATION 81

Tables 3.2 and 3.3 show which of the candidate cost models achieved the largest approximate log-likelihood values in each synthetic scenario for airline E and airline G, respectively. For airline E, the generating cost model achieved the largest approximate log-likelihood in 23 of the 24 instances.

Table 3.2: Candidate Cost Models with Largest Approximate Log-Likelihood for Airline E

σ σ σ σ Generating Cost Model c¯ =0.1 c¯ =0.25 c¯ =0.5 c¯ =1.0 22222 55555 66666 77777 888816 16 16 16 16 16 2c 2012 IEEE

Table 3.3: Candidate Cost Models with Largest Approximate Log-Likelihood for Airline G

σ σ σ σ Generating Cost Model c¯ =0.1 c¯ =0.25 c¯ =0.5 c¯ =1.0 22222 52222 66222 77222 82222 16 2 2 2 2 2c 2012 IEEE

For the airline G, cost model 2 (Passenger Delay) often achieved the largest ap- proximate log-likelihood. Airline G used the same aircraft type for almost every flight. Also, an annual average load factor was used for all flights. Therefore, cost model 2 failed to differentiate between matchings, and almost any matching achieved the minimum cost. This caused LPCAMLE to compute a relatively large 'ˆ(σ2%)for CHAPTER 3. AIRLINE DELAY COST MODEL EVALUATION 82

cost model 2. In this sort of situation it may be wise to remove such cost models from consideration. If the deterministic part of an airline’s cost model fails to differ- entiate between most matchings, its optimal matchings would be determined almost entirely by realizations of the cost noise terms. These are assumed to depend only on factors not observed by the public, suggesting that its matchings do not dependon publicly-observed factors, which seems unlikely. If cost model 2 were removed, then there would only be four instances when the generating cost model did not achieve the largest 'ˆ(σ2%). In Fig. 3.1, the estimates of the normalized standard deviation (σ%/c¯) are plotted as a function of the actual normalized standard deviations (σ/c¯)usedtogenerate the synthetic matchings. There is an estimate for each of the six generating cost models. Larger actual values (σ/c¯) led to larger estimates (σ%/c¯). However, for both airlines, the estimates were severe under-estimates of the actual standard deviation values. Further research into LPCAMLE is needed to understand the reason for these underestimates.

3.5 Evaluating Airline Delay Cost Models

We evaluated a set of candidate cost models. The models depended on publicly available (or at least approximable) characteristics of the flight f (described in Sec- tion 3.5.1) and the minutes that the flight was delayed d. The cost models that were evaluated are documented in Table 3.4. The US Department of Transportation considers a flight delayed when it arrives 15 or more minutes after its scheduled arrival time, and it reports “on-time performance” data that may impact customer perception of airlines. Airlines attempt to reduce the number of flights that are counted as delayed [54, 67, 76]. Therefore, the first cost model is equal to 1 if a flight will be counted as delayed and 0 otherwise. The second cost model is the minutes of delay multiplied by the number of pas- sengers on the flight. This cost model has been used in an airline decision-support tool [124], and it is related to the number of passengers that will miss a connection when a flight is delayed by some amount. Cost models 3 and 4 are the squared CHAPTER 3. AIRLINE DELAY COST MODEL EVALUATION 83

1 Equality Estimates

0.5

0.25 0.1 Standard Deviation

Estimated Normalized 0 0 0.10.25 0.5 1 Actual Normalized Standard Deviation

(a) Airline E.

1 Equality Estimates

0.5

0.25 0.1 Standard Deviation

Estimated Normalized 0 0 0.10.25 0.5 1 Actual Normalized Standard Deviation

(b) Airline G. 2c 2012 IEEE

Figure 3.1: Normalized standard deviation estimates for synthetic data generated with four normalized actual standard deviation values. Table 3.4: Delay Cost Models Number Name Model 1 On-time Performance c1(f,d)=1{d>15} 2 Passenger Delay c2(f,d)=pf d 2 3 Squared Delay c3(f,d)=d 2 4 Squared Passenger Delay c4(f,d)=(pf d) 5 Time-of-Day Delay c5(f,d)=β(tf ,d)d 6 Connection Delay c6(f,d)=γ(f)d $ 7 Airline Connection Delay c7(f,d)=γ (f,af )d 8 Monetary Delay c8(f,d)=η(ef ,d)d 9 Step Function c9(f,d)=ρ(d) 10 Time-of-Day Connection Delay c10(f,d)=β(tf ,d)γ(f)d 11 Time-of-Day Passenger Delay c11(f,d)=β(tf ,d)pf d 12 Connection Passenger Delay c12(f,d)=γ(f)pf d 13 Time-of-Day Connection Passenger Delay c13(f,d)=β(tf ,d)γ(f)pf d 14 Time-of-Day Monetary Delay c14(f,d)=β(tf ,d)η(ef ,d)d 15 Connection Monetary Delay c15(f,d)=γ(f)η(ef ,d)d 16 Connection and Monetary Combination Delay c16(f,d)=α16c6(f,d)+(1− α16)c8(f,d) 17 Airline Connection and Monetary Combination Delay c17(f,d)=α17c7(f,d)+(1− α17)c8(f,d) CHAPTER 3. AIRLINE DELAY COST MODEL EVALUATION 85

delay and squared passenger delay, respectively. Models of this form are proposed in [16]; they provide a simple means of capturing increasing marginal delay costs due to missed connections. The Time-of-Day Delay (cost model 5) is the minutes of delay multiplied by a multiplier that is a function of the scheduled time of arrival and the minutes of delay. This model is based on [19], in which an airline schedule was analyzed to quantify how the magnitude and time of day of a delay impact airline delay costs. It has also been used in airline decision-support tools [73]. The form of the multiplier β(tf ,d) can be found in [19]. Cost model 6 is referred to as Connection Delay and it was proposed in [113] and [117]. It attempts to capture the fact that delaying flights bound for hub airports is especially costly because these flights are likely to involve passengers, crews, and aircraft that need to connect to other flights. The cost is computed as the minutes of delay times a multiplier γ(f) that is 2 for flights bound for high connection rate airports (also known as hubs), 1.5 for flights bound for medium connection rate airports, and 1 for all other flights. The classification of airports into these categories is specified in [117]. An airline-specific version of this cost model was also developed and is referred to as the Airline Connection Delay (cost model 7). In this model, the $ multiplier γ (f,af ) is a function of the airline: the high connection rate and medium connection rate airports vary from airline to airline. Previous research has attempted to calculate the monetary cost of delay in Eu- rope [45]. This work has been adapted for the US market [80]. Cost model 8 is the Monetary Delay, and it is an implementation of the model in [80]. This costmodel is also computed by multiplying the minutes of delay by a multiplier η(ef ,d) that is a sum of per-minute fuel, crew, maintenance, passenger, and other costs. Cost model 9 is referred to as the Step Function because it generates costs that increase in discrete “steps” as various delay thresholds are exceeded. The motivation for this form is that, for example, a delay of less than 60 minutes is assumed to provide sufficient time for passengers to make connections but a delay greater than 60 minutes does not. A model of this form has been used in airline decision-support tools [35, 67]. CHAPTER 3. AIRLINE DELAY COST MODEL EVALUATION 86

The remaining eight cost models include two or more of these first nine cost models. Many factors impact airline decisions [76], and these models involve more factors than any of the first nine models do on their own. Cost models 10–15 involve a product of two or more multipliers and delay. For example, the product of the Time-of-Day Delay multiplier, the Monetary Delay multiplier, and the delay is cost model 14, which has been used in previous research [60, 61]. Cost models 16 and 17

are convex combinations of two other cost models (which means that α16 and α17 are both elements of [0, 1]). The α16 and α17 parameters in these cost models were tuned by trying a set of possible values; the best performance was observed when they were 13 both set to 14 .

3.5.1 Data

The historical matchings used in this study were recorded in Expected Departure Clearance Time (EDCT) log files from 34 days in June–August 2006. The files contain information about airline decisions during 32 AFPs on these days [1]. “Simplified Substitution” messages in these files specify sets of flights, sets of slots, and the corresponding airline-selected matching. These messages contain enough information to define the minimum cost perfect matching problems that it is assumed that the airline solved when selecting the spec- ified matching. However, some assumptions were made in processing the EDCT log

file data. For example, the scheduled time of arrival tf for a flight was set equal to the EENTRY field in the EDCT log files but EENTRY is not actually the scheduled time of arrival; instead it is an estimate of the earliest time the flight can arrive at the FCA. Furthermore, when airlines choose to keep the default FSFS assignment of flights to slots, this is not recorded in the EDCT file. EDCT log files give an in- complete picture of how each airline used slots, which impacts the analysis presented here. Other sources of data that were used in computing the cost model values were Aircraft Situational Display to Industry (ASDI) files, OAG data about the number of seats on various aircraft types [99], and an average load factor computed by GRA [63]. CHAPTER 3. AIRLINE DELAY COST MODEL EVALUATION 87

At any time during an AFP, airlines can submit Simplified Substitution messages to the FAA that specify sets of flights and slots and a matching. Some airlines specify matchings frequently while others do so relatively rarely. A histogramofthenumber of matching messages for the 18 airlines in the data set is shown in Fig. 3.2. Larger numbers of matchings will lead to more meaningful results. There were Simplified Substitution messages specifying matchings for 18 airlines, but 11 of those airlines submitted matching messages fewer than 100 times in the 34 days in the data set.

12

10

8

6

4

Number of Airlines 2

0 0 500 1000 1500 Number of Messages

Figure 3.2: Histogram of the number of matching messages for each airline in the data set.

Each Simplified Substitution message can specify as many flights and slots as the airline would like to match. Some airlines match many flights and slots,but more frequently only a few flights and slots are matched. Histogramsofthenumber of flights and slots in the matchings submitted by the two airlines are presented in Fig. 3.3. In these histograms a bar to the right of 100 represents entries with more than 100 flights and slots. Matchings with more flights and slots reveal more about airline preferences than matchings with just a few flights and slots because airlines have only a few choices when there are only a few flights and slots. For both airlines, the majority of matchings involved fewer than 10 flights and slots. However, airline E had some matchings with more than 100 flights and slots. CHAPTER 3. AIRLINE DELAY COST MODEL EVALUATION 88

1200 1000 800 600 400 200 Number of Messages 0 0 50 100 Number of Flights and Slots Matched per Message

(a) Airline E.

500

400

300

200

100 Number of Messages 0 0 50 100 Number of Flights and Slots Matched per Message

(b) Airline G.

Figure 3.3: Histograms of the number of flights and slots for the matchings of two airlines. CHAPTER 3. AIRLINE DELAY COST MODEL EVALUATION 89

3.5.2 Results

The cost models with the three largest 'ˆ(σ2%) values for each airline, along with the corresponding estimates σ%/c¯, are shown in Table 3.5. Smaller σ%/c¯ values indicate that airline matchings are more likely with additive cost noise values that are relatively small compared to the deterministic part of the cost models. Most are between 0.1 and 0.7, but results in Section 3.4.3 suggest that these estimates are probably low.

Table 3.5: Cost Models with Largest Approximate Log-Likelihood st σ" nd σ" rd σ" Airline 1 c¯ 2 c¯ 3 c¯ A7 0.487 6 0.539 1 1.216 B7 0.454 5 0.509 17 0.525 C6 0.587 16 0.615 7 0.623 D7 0.037 17 0.133 16 0.246 E6 0.369 16 0.336 17 0.351 F160.064 17 0.083 6 0.050 G2 0.051 17 0.164 8 0.185 2c 2012 IEEE

Cost models 6, 7, 16, and 17, which are all based on Connection Delay, achieved relatively large approximate log-likelihood values for most airlines. These models attempt to capture the fact that delaying flights bound to hub airports is especially costly because such flights are likely to involve passengers, crews, and aircraft that need to connect to other flights. The cost was computed as the minutes of delay times a multiplier that is 2 for flights bound for high-connection-rate airports (hubs), 1.5 for flights bound for medium-connection-rate airports, and 1 for all other flights [113].

3.6 Conclusions

Valid airline decision models are essential for meaningful air traffic management re- search. In this chapter, airline decisions in Airspace Flow Programs were used to evaluate several proposed separable flight delay cost models. It was assumed that airlines solve a minimum cost perfect matching problem when matching flights to CHAPTER 3. AIRLINE DELAY COST MODEL EVALUATION 90

slots. Unobserved aspects of airline costs were accounted for by adding a Gaussian noise term to the cost models. A heuristic was developed to find cost models and corresponding noise parameters that maximize an approximation of the log-likelihood of airline decision data. When applied to estimation problem instances based on lin- ear programming problem coefficients and solution data generated with known noise parameters, the heuristic can more accurately estimate noise parameters than a sim- ple simulation-based approach. Validation efforts based on synthetic airline decision data generated with known delay cost models and noise parameters demonstrated that the heuristic was in many cases able to correctly identify as most likely the delay cost model that was used to generate the synthetic data. However, the heuristic also under-estimated the magnitude of the cost noise variance for these problem instances. When applied to airline decision data from 32 Airspace Flow Programs in the summer of 2006, the heuristic found that costs that are proportional to the length of the delay, but with proportionality constants that are larger for flights bound to hub airports, maximize the approximation of the log-likelihood of the historical matchings of most airlines. Finally, the corresponding estimates of the standard deviations of the cost noise terms, expressed as a fraction of the average assignment cost for the historical matchings, ranged from 0.1to0.7. Chapter 4

Ground Delay Program Implementation Models

4.1 Introduction

When predictions of air traffic and capacity suggest that an excessive number of flights will arrive at an airport at some future time, air traffic flow management (TFM) ac- tions like Ground Delay Programs (GDPs) can be used to delay flights ontheground, where it is less expensive to absorb delay than in the air. These actions are typically used when a weather event such as high winds or low ceilings reduces the capacity of an airport. GDPs are used hundreds of times per year at some airports [110], and each GDP can generate thousands of minutes of ground delay. TFM actions are se- lected by human decision makers who must rely primarily on experience and intuition because available forecasts of traffic and weather do not adequately account for uncer- tainties and because they have access to few “what-if” simulation capabilities or other decision-support tools [64, 116]. Furthermore, differences in style or preferences may lead decision makers to select different actions even when confronted with the same situation [89]. Given the importance of TFM actions like GDPs and the variations in their usage that result from forecast uncertainties, the lack of decision-support tools, and differences in decision maker experience, style, or preferences, researchers have mined historical data in an effort to better understand and predict these actions.

91 CHAPTER 4. GDP IMPLEMENTATION MODELS 92

In particular, researchers have begun applying imitation learning techniques in which demonstration expert actions found in historical data are used to develop mod- els that mimic expert actions [107]. Some studies have used traditional classification or clustering techniques to build models that predict or suggest TFM actions based on features that describe the state of the air traffic system [31, 86, 95, 129, 130]. For example, Mukherjee, Grabbe, and Sridhar compared the performance of two clas- sification models (logistic regression and decision tree) that they trained to predict the probability of GDP implementation at an airport based on features describing the current weather and traffic state at the airport [95]. This approach to imitation learning is known as behavioral cloning (BC), and it assumes that expert actions can be characterized and modeled as a reaction to the current state. An alternative imi- tation learning approach is inverse reinforcement learning (IRL). IRL is based upon a model of the system dynamics and how these dynamics are affected by actions; a Markov decision process (MDP) model is typical. IRL assumes that the experts behave rationally in the sense that they select actions in an attempt to maximize a to- tal reward accumulated over time while operating within the system model [84]. IRL uses demonstration expert actions found in historical data to infer a reward function that is consistent with the expert actions (assuming a strategic and total-reward- maximizing expert decision model). While BC can leverage powerful classification and clustering algorithms to make use of many features describing the system state, it does not consider system dynamics as explicitly as IRL does [107]. Furthermore, in problems described as “long-range” and “goal-directed,” IRL has been shown to produce models that generalize to new environments better than models produced by BC [108]. All TFM problems certainly involve a dynamical system and, since GDPs can last for more than ten hours and cause ground delays for flights not scheduled to arrive at a constrained airport for several hours, the problems seem to be strategic. Although we are only aware of applications of IRL to systems considerably simpler than the selection of GDP actions, these characteristics of GDPs suggest that IRL may produce models that can predict and provide insight into expert GDP actions. Even if IRL models produce relatively poor predictions, the reward function inferred by IRL algorithms may provide a “succinct, robust, and transferable” definition of CHAPTER 4. GDP IMPLEMENTATION MODELS 93

when to implement GDPs [10]. Insights gleaned from such a reward function could guide researchers as they develop TFM decision-support tools [46, 89, 114]. Our objective in this chapter is to build descriptive models that can (1) predict GDP actions and (2) provide insight into how and why the actions are selected. While predictions may be valuable in decision-support tools for stakeholders influenced by GDPs, the insights could also help with the development of decision-support tools for those that implement GDPs. To build the models, we utilize historicaldata describing GDP actions and factors that might influence these actions. While this data-driven approach means that the models are necessarily rooted in the past and therefore will not account for the impact of new technologies or procedures, it also allows us to leverage powerful tools from machine learning, and it leads to quantified insights into historical GDP decision making. In this chapter, we make four main contributions. First, we deploy both BC and IRL techniques to model and predict hourly expert GDP implementation actions. As far as we know, this is the first ap- plication of IRL to a problem in air traffic management, and by comparingthese two fundamentally-different approaches we gain greater insight into GDP implemen- tation decision making. Second, by deploying an IRL technique, we infer a reward function that is consistent with historical GDP implementation actions. While util- ity functions that may guide runway configuration decisions have been inferred from historical data using discrete-choice models [104, 105], we are not aware of any other inference from historical data of a reward function that may be guiding TFM actions. Third, we analyze GDP implementation by focusing on important GDP initialization and GDP cancellation events, which occur ten or more times less frequently than the corresponding non-initialization or non-cancellation events at most airports. This approach yields an operationally relevant evaluation of the predictiveperformanceof models, as well as more nuanced insights into GDP implementation decision making. Other research that involves predicting GDP implementation has not distinguished between initialization and cancellation events (e.g., see [95]). Fourth, we use histor- ical data to infer the degree to which future conditions are considered when GDP implementation decisions are made. Again, we are not aware of other work in which historical data was used to make this sort of inference. CHAPTER 4. GDP IMPLEMENTATION MODELS 94

The remainder of this chapter is structured as follows. Section 4.2 reviews the data we use in this analysis. Next, we specify the imitation learning GDP models that are developed in this research in Section 4.3. We conduct parametric studies related to decision maker look-ahead and evaluate the quality of model predictions in Section 4.4. We also investigate the models to glean insights into GDP implementation decision making in that section. Finally, we provide conclusions in Section 4.5. The material described in this chapter was initially published in [25] and [26]. The author of this dissertation made major contributions to the research presented in and the writing of these publications, and is the first author on both.

4.2 Data

In this section, we describe the data that we used to quantify the system state for GDP models attempting to predict GDP actions. Data were collected for EWR and SFO for the 151 days from 1 May 2011 through 29 September 2011, the 152 days from 1 May 2012 through 30 September 2012, and the 152 days from1May2013 through 30 September 2013; there were 455 total days in the datasetfromthese three summers. The causes of reduced airport capacity can vary between summer and winter months, so we simplified the problem by choosing to study only summer months. For example, some GDPs at EWR are reported as caused by “snow/ice,” but by using only summer months we avoided the need to develop models that can account for such GDPs [110]. Night-time data were removed from consideration because, due to low traffic volumes, GDPs are typically not used during the night. Hourly samples of each type of data were generated from 11:00 UTC (7:00 am EDT) through 06:00 UTC (2:00 am EDT) on the next day (20 hours per day) for EWR and from 12:00 UTC (5:00 am PDT) through 09:00 (2:00 am PDT) on the next day (22 hours per day) for SFO. Overall, this led to 9100 hourly data points for EWR and 10, 010 for SFO. We studied EWR and SFO because in recent years GDPs have been usedatthese airports more than at any other airports [110]. Indeed, during these 455 days, there were 208 and 297 GDPs utilized at EWR and SFO, respectively. Assumingnomore CHAPTER 4. GDP IMPLEMENTATION MODELS 95

than one GDP was used per airport per day (as is typical), a GDP was utilized at EWR in more than 45% of these days and at SFO in more than 65% of these days. Another reason for selecting these airports is that we expected there to be meaningful differences in GDP decision-making practice for the two airports. This expectation was motivated in part by the different weather phenomena that typically lead to decreased arrival capacities at these two airports (winds at EWR but low ceilings at SFO) [110]. This expectation was also motivated by differences between the characteristics of arrival traffic demand at the two airports. For example, due to the relative proximity of EWR to multiple metropolitan areas in the eastern and midwestern United States, a larger fraction of flights bound to EWR than to SFO originate within a relatively short distance of the airport. Each of the hourly data points contains hundreds of features.Someofthese features described current conditions, such as the current airport configuration. Other features were based on predictions of future states. Examples of the latter include arrival schedules, which are a simple prediction of future arrival traffic levels, and weather forecasts, which provide predictions of future weather conditions. For each of the hourly data points, the set of predictions included predictionsofthesystem state, beginning with the current time and extending in hourly increments to ten hours from the current time. Some of the predictions were for a specific moment in time, while others applied to a specific one-hour interval of time. In all, 257 features, which will be described in the following sections, were potentially available to the GDP models. However, based on the results of experiments described in Section 4.4.1, we eventually only included features extending to the hour starting four hours from the current time. As a result, 125 features were ultimately made available to the GDP models. It may seem that by using hundreds of features to describe the state we were providing the models with a relatively complete picture of the factors that influence GDP decision making. However, GDP decision making is so complex and candepend on so many factors that this is certainly not the case. Some factors that may impact GDP decision making had to be left out because data quantifying them are not readily available. For example, an arrival by Air Force One can reduce the arrival capacity CHAPTER 4. GDP IMPLEMENTATION MODELS 96

of an airport and lead to the need for a GDP. Stakeholder (i.e., airline) preferences expressed during a planning teleconference can also impact GDP implementation decisions, but detailed minutes of these teleconferences are not available, and even if they were, it may be difficult to meaningfully quantify such expressionsofpreferences. To focus and simplify this study, we also chose not to consider certain other factors that impact GDP decision making, even though they are quantified in available data. For example, to account for how operations at nearby airports impact operations at EWR and SFO, we might have added features describing runway configurations at these other airports. The exclusion of these runway configurationdatameansthat the models do not account for the impact of operations at nearby airports.

4.2.1 Weather Observations

Observations of weather conditions at an airport are recorded in aviation routine weather reports (commonly referred to as METAR reports), which we retrieved from the FAA’s Aviation System Performance Metrics (ASPM) database [58]. These data include the meteorological conditions (instrument or visual), ceiling height and its change since the last hour, visibility distance, wind speed, wind angle, and landing runway head wind and landing runway cross wind for the active runway configuration (eight features).

4.2.2 Weather Forecast

Terminal Aerodrome Forecasts (TAFs) provide predictions of weather conditions at an airport that extend 24 hours or more into the future, and they areoftenusedbyTFM decision makers. At EWR and SFO, TAFs were published at least every three hours during the time period we studied. From TAFs, we extracted predictions of conditions at hourly intervals, beginning with current hour and continuing until the hour starting 10 hours from the current time. For each of these 11 hours, we included a feature for the meteorological conditions (visual, marginal visual, instrument, or low instrument), ceiling height, change in ceiling height over the last hour, whether ceiling heights were “temporary,” wind speed, wind direction, wind gust speed, landing runway cross wind CHAPTER 4. GDP IMPLEMENTATION MODELS 97

speed for the most common runway configuration, landing runway head wind speed for the most common runway configuration, visibility distance, whether the visibility distance was greater than the specified quantity, whether the visibility conditions were “temporary,” the intensity of precipitation, the intensity of obscuration, and whether precipitation and obscuration were “temporary.” There were 14 features specified for 11 hours and one (change in ceiling height) for 10 hours, which means 164 features were derived from the TAF for each hour.

4.2.3 Number of Scheduled Arrivals

ASPM records also contain the number of scheduled arrivals at the airport during each quarter hour. We used these records to generate a feature denoting the scheduled arrivals during the hour starting at the current time and extending through the hour starting 10 hours from now (11 features).

4.2.4 Current Airport State

Other features that describe the current state of the airport were also extracted from ASPM records; two of these features were the airport arrival rate (AAR) for the current hour and the runway configuration. The deterministic departure queue is a third such feature. It was calculated by constructing a simple deterministic queuing model based on the scheduled number of departures and the airport departure rate. Large values for this feature might be correlated with surface movement congestion and long takeoffqueues.

4.2.5 Predictions of Future Airport Arrival Rates

GDPs are typically used when an airport’s arrival capacity, quantifiedbytheAAR, is expected to be too low to handle the predicted number of arrivals. The AAR selected by traffic managers depends on factors such as runway configurations, weather conditions, and the type of aircraft that are scheduled to arrive at the airport. Other researchers have developed AAR prediction models [39, 48, 50, 88, 102, 115, 127, 128], CHAPTER 4. GDP IMPLEMENTATION MODELS 98

and we implemented a model similar to the bagged decision tree model proposed by Provan, Cunningham, and Cook in [102] and [48]. This type of model worked well, not only for Provan, Cunningham, and Cook, but also for Wang in [127] and [128]. The AAR predictions from this model for consecutive one-hour periods, the first starting one hour from the current time and the last starting ten hours from the current time, were provided to the GDP models (ten features).

4.2.6 Reroutes

Reroute advisories were collected from a database of advisories posted on an FAA website [11] and used to construct features describing reroutes required or recom- mended for flights bound to the airport. GDPs are sometimes used to help address reductions in the capacity of airspace typically used by flights headed to the airport, such as airspace near arrival fixes. This sort of capacity reduction may also result in reroutes, so these features were added in an attempt to quantify reductions in airspace capacity that may cause GDP implementation. More precisely, for the hour starting at the current time through the hour starting 10 hours from the current time, there were 4 reroute-related features (44 total). Two of these reported the number of departure Air Route Traffic Control Centers (ARTCCs or Centers) and departure airports for which reroutes are recommended for flights bound to the airport. The other two features reported the number of departure Centers and departure airports for which reroutes were required for flights bound to the airport. These features by no means completely describe congestion in relevant airspace, however; additional features could be added to describe it more fully.

4.2.7 Previous GDP Plan

GDP plan data was also retrieved from the advisory database. While GDP plans are sometimes modified, selected GDP actions often are a continuation of a previously- announced plan. For each hour, we quantified the GDP action that would be pursued at the start of the hour, assuming the GDP plan as specified one hour ago were simply continued. The features describing the previous plan included whether or not there CHAPTER 4. GDP IMPLEMENTATION MODELS 99

would be a GDP implemented, the GDP scope, the number of hours until the first hour of controlled GDP rates, the number of hours until the last hour of controlled GDP rates, and the GDP rates that should be in place for each relevant hour of the GDP (from the hour starting at the current time through the hour starting 10 hours from the current time). This means that there were 15 features describing the previous GDP plan for each hourly time step.

4.2.8 Ground and Air Buffers

One of the fundamental purposes of a GDP is to prescribe ground delay—which is cheaper than delay absorbed in the air—when some delay must be absorbed. There- fore, we defined a simple deterministic queuing network model with buffers for flights delayed on the ground and in the air. It provides estimates of two quantities that are essential to GDP planning: how much delay is absorbed on the ground and in the air as a result of GDP actions. This model is similar to the deterministic queuing model utilized by Kim and Hansen in [81], the stochastic queuing model proposed by Odoni and discussed by Ball et al. in Section 4.3 of [15], and the model used in a stochastic ground-holding problem specified by Ball et al. in [17]. Although this model certainly fails to capture several relevant aspects of GDPs (such as differences in flight times between flights), our hope is that it captures enough of the relevant characteristics of the real world to be useful for GDP analytics but without the burdenofunnecessary complexity. At the start of each day, the model initializes the ground and air buffers to zero. g The number of flights in the ground buffer at the start of time step t is bt and the a number in the air buffer is bt . The system dynamics specify that the buffer levels g a at the start of the next time step (bt+1 and bt+1) depend on the scheduled arrivals g a (scheduledt)andAAR(AARt) during time step t, as well as bt , bt ,andtheGDP action implemented in this time step. If a GDP is implemented during t (GDPt), then the relevant components of the GDP action are whether or not the arrival rate is controlled by the GDP during t (controlledt) and the controlled arrival rate during CHAPTER 4. GDP IMPLEMENTATION MODELS 100

t (ratet). The buffer levels are updated according to

g [bt + scheduledt − ratet] if GDPt and controlledt bg = + (4.1) t+1  0 else

and 

a g a [bt + min(bt + scheduledt, ratet) − AARt]+ if GDPt and controlledt bt+1 = [ba + bg + scheduled − AAR ] else.  t t t t + (4.2)  Here [x]+ is equal to x if x ≥ 0 but equal to 0 if x<0. Fig. 4.1 depicts this simple queuing model when a GDP is implemented.

g a bt bt scheduledt ratet AARt

Figure 4.1: Ground and air buffer system model.

4.3 GDP Models

The structure we specified and used, both for the BC and the IRL GDP models, is depicted in Fig. 4.2. The input data that can be used by the models is described in Section 4.2. The output of the model is either a prediction that a GDP would not be implemented in the state quantified by the input data, or a prediction that a GDP would be implemented, along with predictions of the GDP parameters that would be used in the state quantified by the input data. The GDP parameters include the GDP scope, the time when the controlled rates in the GDP begin (the GDP start time), the number of hours of enforced GDP rates (the GDP duration), and the enforced rate for each hour in the GDP duration, extending from the hour starting at the current time through the hour starting 10 hours in the future (up to 11 hourly rates). CHAPTER 4. GDP IMPLEMENTATION MODELS 101

There are two sub-models: the GDP Implemented model predicts whether or not a GDP will be implemented and the GDP Parameters model predicts GDP parameters when a GDP is initialized. The GDP model is a simplification of reality in that it requires that a GDP plan either progresses as planned or is canceled; no modifications or extensions are permitted.

weather observations weather forecast traffic schedule AAR runway configuration departure queue predicted AARs reroutes previous GDP plan ground & air buffers

GDP Implemented GDP plan: do not model no GDP implement implement GDP GDP

GDP Parameters GDP plan: GDP model - scope parameters - start time - duration - rates

Figure 4.2: Structure of the GDP model.

4.3.1 GDP Implemented Models

A GDP Implemented model predicts whether or not a GDP will be implemented during a given hour, given a set of features describing the state (see Section 4.2). We CHAPTER 4. GDP IMPLEMENTATION MODELS 102

developed and compared BC and IRL GDP Implemented models. The BC and IRL algorithms used to train these models are discussed in the next two sections.

BC: Random Forest Classifiers for Cancellation and Initialization

The structure of the BC model developed and analyzed for this research is depicted in Fig. 4.3. Depending on whether or not the previous GDP plan specifiedthat a GDP would be implemented in the current hour, either a GDP Cancellation or GDP Initialization model is invoked. We hope that creating specific models for what seem to be different decisions (GDP cancellation and GDP initialization) will lead to better predictive performance, as well as to more refined insight into GDP decision making. An additional motivation for building separate models for GDP cancellation and GDP initialization is that doing so facilitates the use of over- and under-sampling to generate custom training data sets for each model, which helps with the difficult imbalanced classification problem each model faces. The GDP Initialization model is provided with features describing the current state at the airport, but no features describing the previous GDP plan, because the previous plan was to not use a GDP. It then predicts either that a GDP will be initialized or that no GDP will be initialized. The GDP Cancellation model is provided with features describing not only the current state at the airport, but also features describing the previous GDP plan, such as the scope, planned end time, and rates. It then predicts either that the GDP will continue as planned or that it will be canceled. The GDP Cancellation model is not used in hours that are immediately after the planned end of a GDP because not implementing a GDP in that hour is not a cancellation. The question in those hours is whether or not a new GDP will be initialized, so the GDP Initialization model is invoked instead. The GDP Cancellation and GDP Initialization models are both random forest classification models, implemented with the RandomForestClassifier class available in the scikit-learn package for the Python programming language [100]. Random forest models were selected because they typically perform well with minimal tuning of the algorithms that train the models, they generally do not over-fit to the training data even when provided with many features, and other researchers have found that CHAPTER 4. GDP IMPLEMENTATION MODELS 103

previous GDP plan

Plan to Yes No implement GDP?

weather observations weather forecast traffic schedule AAR predicted AARs GDP GDP runway configuration Cancellation Initialization departure queue model model reroutes GDP ground & air buffers No GDP continues or or GDP GDP initialized cancellation

Figure 4.3: Structure of the BC GDP Implemented model. related models predict AARs better than alternative models [71, 102, 127, 128]. When we analyzed all of the hours in which a GDP was planned to be imple- mented, we found that a GDP was canceled in only about one out of tenofthese hours. Similarly, when we analyzed all of the hours in which a GDP was not planned to be implemented, we found that a GDP was initialized in only about one out of thirty of these hours. Predictive performance can sometimes be enhanced by using the Synthetic Minority Over-Sampling Technique (SMOTE) when facing imbalanced data sets such as these [43]. We used an implementation of the SMOTE algorithm to generate synthetic minority class (initialization and cancellation) data points [77], and we also under-sampled the majority class data points. In particular, the number CHAPTER 4. GDP IMPLEMENTATION MODELS 104

of minority samples was doubled by using SMOTE to generate synthetic samples, and the majority samples were under-sampled so that there were four times as many (for EWR) or twice as many (for SFO) majority-class samples as (real and synthetic) minority-class samples.

IRL: Cascaded Supervised Inverse Reinforcement Learning

The IRL algorithm selected for evaluation is the Cascaded Supervised Inverse Rein- forcement Learning (CSI) algorithm proposed by Klein et al. in [82]. This approach was selected for several reasons: it is relatively easy to implement, it does not require multiple computations of an optimal policy for various possible reward functions, and it leverages existing classification and regression algorithms. Furthermore, it does not involve exploring the entire state space—just the states visited in the training data and possibly in some additional simulations. Klein et al. provide a theoretical guar- antee for CSI in Theorem 1 of [82]: the expert policy (the policy that produced the demonstrations in the data) is near-optimal for the reward function estimated by the CSI algorithm. Although the authors provide no guarantee that the inferred reward function will be non-trivial (i.e., not equal to 0), they at least show that it will likely be meaningful when the CSI sub-algorithms perform perfectly (see Corollary 1 and related text in [82]). One downside of the algorithm is that it assumes a deterministic expert policy that is optimal for a certain reward function, which is probably not the case for GDP decision making. This drawback did not prevent us from selecting the CSI algorithm, however, because for this initial investigation we are working with deterministic GDP Implemented models. There are six main elements involved in CSI. Each of these elements will be de- scribed in the subsequent paragraphs. Figure 4.4 depicts the flow of data, functions, policies, and models through the various algorithms that make up CSI. System Model: The CSI algorithm, like any other IRL algorithm, requires a system model. We utilized an extension of the simple deterministic queuing model described in Section 4.2.8 as the system model. The system state at a time step t is st, which e is a member of a set of all possible states S. It involves an exogenous state st and a c e c controlled state st : st =(st ,st ). The exogenous state is a vector of features describing CHAPTER 4. GDP IMPLEMENTATION MODELS 105

training data (st,at) ∀t SFMC2-training algorithm

score function SFMC2 policy q πC(s) ∈ argmaxa q(s, a)

generate reward sample data: rˆt = q(st,at) − γq(st+1,πC(st+1))

reward sample data rˆt ∀t

regression algorithm

reward regressor & RˆC(s, a)=θˆ f(s, a) algorithm to derive ground and air approximately-optimal buffer dynamics policy policy πˆ% RˆC

Figure 4.4: Structure of the CSI algorithm. CHAPTER 4. GDP IMPLEMENTATION MODELS 106

the weather observations, the weather forecast, the traffic schedule, the current AAR, the runway configuration, predicted AARs, and reroutes (see Section 4.2). GDP actions have no impact on the exogenous state in this model. The controlled state, on the other hand, is impacted by GDP actions. It contains g the GDP action for this time step prescribed by the previous GDP plan (plant), bt , g c a g and bt .Therefore,st = (plant,bt ,bt ).

The action at taken in t is binary: it specifies either that no GDP is implemented or that a GDP is implemented in this time step. If a GDP is implemented, then GDP parameters are selected by the GDP Parameters model discussed in Section 4.3.2, not by the IRL GDP Implemented model discussed in this section. The selected

action is a member of a set of all possible actions: at ∈A= {GDP, no GDP}.In conjunction with the output of the GDP Parameter models, this action will impact the next controlled state as prescribed by the queuing system dynamics described in Section 4.2.8. e e There are also system dynamics influencing how st influences st+1,butwespecu- late that these are much too complicated for us to model. The CSI algorithm allows us to implicitly learn these dynamics from state transitions recorded in the training e data. Let Eqs. (4.1) and (4.2), along with the unknown dynamics describing how st e $ $ influences st+1,bedenotedbyP ,whereP (s |s, a)=Prob(st+1 = s |st = s, at = a) for states s, s$ ∈Sand action a ∈A. In this research, we assume that we do not have a full specification of P . This is a Markov model because the conditional distribution of future states depends only on the current state and action, not on the whole history of states and actions. Decision Process Model: Further modeling is required to specify the decision pro- cess we assume is utilized when GDP implementation actions are selected. We assume that traffic managers are attempting to maximize the expected value of a discounted infinite sum of future rewards while operating in the system model described in the previous paragraphs. More precisely, they seek a deterministic policy mapping states to actions that maximizes ∞ t E γ R(st,at) , (4.3) 6 t=0 7 ! CHAPTER 4. GDP IMPLEMENTATION MODELS 107

where γ ∈ [0, 1] is the discount factor and R : S×A→R is the reward for each time step. The tuple (S, A,P,R,γ) defines the MDP model we utilized in this research. Classifier Policy and Estimation of Optimal State-Action Value: The next step in

CSI is to derive a deterministic classifier policy πC : S→A(the first box in Fig. 4.4). This classifier policy can be any score function–based multi-class classifier (SFMC2). An SFMC2 predicts an action that achieves the largest score according to some score function q(s, a):

πC(s) ∈ argmax q(s, a). (4.4) a∈A By interpreting the score function q(s, a) as an optimal state-action value function for πC, the CSI algorithm can use the score function and the policy to generate a reward sample data pointr ˆt corresponding to each state-action pair (the second box in Fig. 4.4). These reward samples were used to train a reward regressor in the next step of the algorithm (the third box in Fig. 4.4). We utilized a model similar to the random forest BC GDP Implemented model described in Section 4.3.1 for the classifier policy πC. The classifier policy differs in that it uses a single model to predict GDP implementation, not separate models for initialization and cancellation. Furthermore, we did not adjust the training data by over- or under-sampling. The score function for a given state and action is the average over all the trees in the random forest of the fraction of members in the leaf nodes for which the action was selected. The action maximizing this score is returned as the prediction of this model, so it qualifies as an SFMC2 policy. This score is always in [0, 1], and it is conveniently accessed by requesting the predicted probability of an action from the RandomForestClassifier class [100]. Reward Estimation with Regression: The next step in CSI is to use a regression algorithm to estimate a reward function RC that is consistent with the score function q when it is interpreted as an optimal state-action value function for πC [82] (the second and third boxes in Fig. 4.4). Given some policy π, a unique state-action π value function QR(s, a) corresponding to a reward function R satisfies the Bellman equation: π $ π $ $ QR(s, a)=R(s, a)+γ P (s |s, a)QR(s ,π(s )). (4.5) s! !∈S CHAPTER 4. GDP IMPLEMENTATION MODELS 108

% Furthermore, an optimal policy πR maximizing the total discounted future reward objective (4.3) will satisfy

" % πR πR(s) ∈ argmax QR (s, a). (4.6) a∈A

If we assume that πC is an optimal policy with respect to a reward RC,thenwe

πC can interpret q as a proxy for the optimal state-action value function QRC because of the similarity between Eqs. (4.4) and (4.6). Furthermore, under these assumptions

πC we can express a relationship between πC, RC,andQRC by substituting them into πC the Bellman Eq. (4.5). By doing so, using q as a proxy for QRC , and then solving for for RC(s, a), we get

$ $ $ RC(s, a)=q(s, a) − γ P (s |s, a)q(s ,πC(s )). (4.7) s! !∈S

If we can find an RC such that Eq. (4.7) holds for all state-action pairs, then q is

πC not just a proxy for the optimal state-action value function QRC —the two are equal πC πC because the Bellman equation holds with q in the place of QRC and because QRC is unique (given πC and RC). Therefore, πC will be optimal with respect to this RC (due to the identical forms of Eqs. (4.4) and (4.6)). Furthermore, πC was constructed to behave like the expert demonstrations, so the expert policy will also be near-optimal with respect to this RC. Roughly speaking, the more states for which πC and πE prescribe the same action, the closer the expert policy will be to optimal with respect to RC (see the proof of Theorem 1 in [82] for details). Unfortunately, we do not know the transition probabilities required to compute RC directly with Eq. (4.7). However, using Eq. (4.7) and state-action samples, we can generate a data set consisting of state-action pairs (st,at) and corresponding reward samplesr ˆt (the second box in Fig. 4.4), where the reward samples are computed with

rˆt = q(st,at) − γq(st+1,πC(st+1)). (4.8)

This data set is then used to train a regressor RˆC that approximates RC (the third CHAPTER 4. GDP IMPLEMENTATION MODELS 109

box in Fig. 4.4). Any type of regressor can be used, but we used a linear regressor model (as implemented in the OLS class of the statsmodels Python package [5]). Linear models are relatively easy to interpret, so a trained linear regressor model should be relatively rich in insight. Furthermore, some proposed TFM and GDP optimization approaches assume or even require a reward function that is linear in the optimization decision variables (examples of such approaches are described in [15] and [116]). Although using a linear regressor does not guarantee that the inferred reward function will be directly applicable in these optimization approaches, we speculate that using a linear form will facilitate the use of our results by those seeking an optimization objective that is inferred from historical TFM actions. The form of a & M linear regressor model is RˆC(s, a)=θˆ f(s, a), where f(s, a) ∈ R is a vector of reward features for the state-action pair (s, a)andθˆ ∈ RM is a vector of parameters estimated by the regression algorithm. This regressor takes as inputs state-action pairs and returns rewards such that the actions proposed by πC are near-optimal.

Roughly speaking, the better the fit achieved by the regressor, the closer πC will be to optimal with respect to it (see the proof of Theorem 1 in [82] for details). When generating this reward data set, we used the exogenous states in the train- ing data, and actions and corresponding controlled states produced by simulating a randomized policy. The randomized policy randomly selects πC or a na¨ıve policy for use in each time step with equal probability. Regardless of the state, the na¨ıve policy simply selects GDP initialization and GDP cancellation actions with probabilities equal to the frequency of these events in the training data. Klein et al. suggest this sort of approach to increase the diversity of the training data in an effort to increase the generality of the resulting reward model (see [82], Section 6). A reward regressor trained only on expert-like actions will not encounter many low-reward state-action pairs, and will therefore not know how to evaluate those when they are encountered. The reward features we utilized were inspired by the objective function used in the GDP Parameter Selection Model [46] and also by performance measures suggested by Ball et al. [15]. The eight reward features were ground and air buffer levels at the end of the current time step, the change in the air and ground buffers during this time step, an indicator that the air buffer will be greater than or equal to five CHAPTER 4. GDP IMPLEMENTATION MODELS 110

at the end of the current time step, the number of arrivals during the time step, the number of unused arrival slots while the arrival rate is controlled by a GDP (i.e., while

controlledt is true), and the canceled duration of a canceled GDP. Before training the regressor, a constant reward feature was added and the scalar-valued reward features were standardized over the time steps for which they are defined, while the lone indicator feature (which only takes values of zero or one) was not standardized. Derivation of Approximately-Optimal Policy: As depicted in the fourth and bottom- most box in Fig. 4.4, we derived a policyπ ˆ% that attempts to maximize the objective RˆC in Eq. (4.3), defined based on RˆC, using an approximate dynamic programming ap- proach known as rollouts (see [21], Section 6.4). Ultimately, the prediction produced by the CSI model of whether or not a GDP will be implemented when in state s will be the action returned byπ ˆ% (s). Other techniques from reinforcement learn- RˆC ing and approximate dynamic programming could be used to derive this policy. We selected rollouts because we could use weather forecast and traffic schedule data al- ready in the system state to perform the simulations required when estimating optimal state-action value functions, which made deriving an approximately-optimal policy tractable in spite of the unknown state transition probabilities. We will discuss our implementation of the rollouts algorithm in detail in the subsequent paragraphs. The idea behind the rollouts algorithm is to use simulations to constructanes- timate Qˆ% (s, a) of the optimal state-action value function for each action a ∈A RˆC that can be taken from the current state s. The policy then selects an action that maximizes Qˆ% (s, a)fromthecurrentstate: RˆC

% ˆ% πˆRˆ (s) ∈ argmax QRˆ (s, a). (4.9) C a∈A C

Each simulation makes use of a base policy operating in a simplified version of the MDP. We used a simple deterministic classifier policyπ ¯C as the base policy. If we were simulating to estimate Qˆ% (s, a)whena = GDP, then we called the GDP RˆC Parameters model to get a plan, and requiredπ ¯C to execute the plan as specified in the simulation. After the GDP completed as planned, the policy selected at each time step whether or not to implement a GDP, and the current GDP rate, with random CHAPTER 4. GDP IMPLEMENTATION MODELS 111

forest classification and regression algorithms similar to those used in the BC GDP Implemented model and the GDP Parameters model. An important difference is that the base policy was provided only with instantaneous features for which forecasts or predictions were available within the current state. This allowed us to avoid simulat- ing the system dynamics, described by the unknown state transition probabilities P , by instead assuming that traffic and weather will occur as described by the forecasts and predictions contained in the current state. Similarly, if we were simulating to estimate Qˆ% (s, a)whena = no GDP, then these classification and regression algo- RˆC rithms determined whether or not to use a GDP and the GDP rate, respectively, for each time step in the rollouts simulation. We used rollouts simulations that are just one time-step long before estimating the optimal state value from the final state of the

simulations ˜ as maxa∈A q¯(˜s, a), whereq ¯ is the score function forπ ¯C.Wechoseshort simulations in part because our parametric investigation of γ led us to use relatively low γ values (see Section 4.4.1). Low γ values mean that rewards earned after the current time step do not significantly impact decisions, implying that longer rollouts simulations would typically be superfluous.

4.3.2 GDP Parameters Model

As depicted in Fig. 4.2, a GDP Parameters model was developed to predict the GDP scope, GDP start time, GDP duration, and GDP rates. This model is only used in the final policy estimation step of the CSI algorithm, described in Section 4.3.1. Random forest regressor BC models were used for each sub-model that predicts one of the GDP parameters, and the predictions were rounded to the nearest typical value for the parameter in question. Random forest models were selected because they perform well with relatively little tuning of the algorithm that trains the models, they generally do not over-fit to the training data even when provided with many features, and other researchers have had some success using related models to predict AARs [71,102,127, 128]. Random forest regression models for these parameters were trained with the RandomForestRegressor class in the scikit-learn Python package [100] using settings and parameters similar to those suggested in [102]. Although these models are not CHAPTER 4. GDP IMPLEMENTATION MODELS 112

the focus of this research, we are not aware of previous attempts at building models that predict GDP parameters.

4.4 Experiments

Ten-fold cross validation was used in these experiments (see [71], Section 7.10.1). The folds were defined based on days in the data set rather than individual time steps (hours). With 455 days in the data set and ten folds, each fold consisted of all of the time steps in around 46 days. The experiments focused on the GDP Implemented models.

4.4.1 Parametric Studies Related to Look-ahead

Characteristics of both BC and IRL models provide insight into the degree to which decision makers are looking ahead into the future when selecting actions. Both models were provided with features that quantify predictions of future weather conditions, future scheduled traffic, and predictions of future airport arrivalrates.Thedata set included predictions extending ten hours into the future, but if decision makers only consider predictions extending, say, five hours into the future, then providing the models with the remaining predictions will not improve, and may evenharm, the quality of model predictions. Therefore, we conducted a parametric study to determine the impact of changing how far into the future these predictions extend (a quantity we refer to as the look-ahead time horizon) on GDP implementation prediction quality. Furthermore, for the IRL model, the discount factor γ quantifies the relative importance of rewards accrued now and rewards accrued in the future. Therefore, we also conducted a parametric study to determine the impact of changing γ on reward regressor prediction quality.

BC: Look-ahead Time Horizon Parametric Analysis

A parametric analysis was performed to determine the impact of changing the look- ahead time horizon on the predictive performance of the BC GDP Implemented CHAPTER 4. GDP IMPLEMENTATION MODELS 113

model. That is, we changed the number of features provided to the model by re- moving features describing predictions beyond a certain time horizon in the future, and we recorded the performance of the model while varying this time horizon be-

tween zero and ten hours. To evaluate predictive performance, weusedtheF1-score because it is an appropriate metric for imbalanced data sets. It ranges between zero and one, with larger values indicating better performance.

Figure 4.5 shows the means (dots) and standard deviations (bars) of the F1-scores achieved on the ten test data folds for various look-ahead time horizons. For each air- port, the GDP implementation curve is flat, indicating that the quality of these overall predictions did not change much with different look-ahead time horizons. Predictions of GDP initialization and cancellation at EWR improve as the horizon increases to four and two hours, respectively, which supports the claim that decision makers use predictions extending at least that far into the future when determining whether to initialize or cancel a GDP. For SFO, on the other hand, the characteristics of the quality of predictions as the look-ahead time horizon increases are consistent with the claim that decision makers only use predictions extending to the next hour when deciding whether or not to initialize a GDP, and that they use no predictions at all when deciding whether to cancel a GDP. Given that initial GDP plans can specify GDPs extending ten or more hours in the future, and that GDPs can cause ground delays now for flights that will not arrive for four or more hours, it is somewhat surprising that these results are consistent with relatively tactical GDP implemen- tation decision making. However, there are several possible explanations for such tactical decision making. The dynamics of traffic demand and weather conditions are stochastic, so predictions at longer look-ahead time horizons may be subject to too much uncertainty to be helpful for decision makers. Furthermore, GDP imple- mentation decisions can always be changed later, so decision makers may not always need to look very far into the future. A related explanation is that GDP parameters can be revised. For example, GDP rates can be increased or decreased, and the end time can be extended. This gives decision makers the ability to adjust aGDPto better fit unforeseen conditions, and may therefore reduce the need to look far into the future when determining whether to implement a GDP. Finally, when there is CHAPTER 4. GDP IMPLEMENTATION MODELS 114

sufficient demand originating from nearby airports, GDPs may be able to effectively eliminate airborne delay that is expected just a couple of hours in the future. If this is the case, then anticipated airborne delay further in the future could be handled by implementing GDPs later, reducing the need to look more than a couple of hours ahead. For the results in the remainder of this chapter, we used a look-ahead time horizon of four hours. Prediction quality does not meaningfully change for any of the models with longer look-ahead time horizons, so using this horizon should lead to more succinct models without sacrificing prediction quality. With a four-hour look-ahead time horizon, the models were given 125 features describing the state.

IRL: Parametric Investigation of Discount Factor

We also performed a parametric analysis to determine the impact of changing γ on the predictive performance of the reward regressor RˆC. More precisely, we investigated 11 possible values of γ ranging from 0.0to1.0. A γ of 0.0 means that the objective does not consider rewards in future time steps at all, while a γ of 1.0meansthatthe objective places equal weight on rewards earned in any time step—even time steps far in the future. For each possible γ, we constructed a corresponding set of reward samples and then trained a corresponding reward regressor. Ten-fold cross validation was used to do this repeatedly, and the means (dots) and standard deviations (bars) of the R2 values achieved on the test data sets by each regressor are plotted in Fig. 4.6. We used R2 to quantify predictive performance because changing γ changes the magnitude and distribution of the reward sample data. The R2 metric involves normalization in a way that permits comparisons of predictive performance as changes in γ change the truth data distribution. The R2 values in Fig. 4.6 range between around 0.15 and 0.30, which indicates that the regressor achieves a poor fit of the data. This may be a cause of the relatively poor predictive performance of the IRL model (see Section 4.4.2). For EWR, the R2 increases as γ increases from 0.0, peaks at γ =0.4, and then decreases as γ increases further. For SFO, the R2 decreases monotonically as γ increases. These results also suggest that GDP implementation decision making is more tactical than strategic: it CHAPTER 4. GDP IMPLEMENTATION MODELS 115

(a) EWR.

(b) SFO.

Figure 4.5: Prediction quality of BC GDP Implemented model as look-ahead time horizon changes. CHAPTER 4. GDP IMPLEMENTATION MODELS 116

(a) EWR.

(b) SFO.

Figure 4.6: Fit of reward regressor as γ changes. CHAPTER 4. GDP IMPLEMENTATION MODELS 117

is not concerned much with rewards earned after the current time step. As was the case when we investigated the impact of look-ahead time horizon in Section 4.4.1, the results suggest that decision makers were more concerned with future conditions at EWR than at SFO. For the remainder of this chapter, we used γ =0.4fortheEWRIRLmodel because that value leads to the largest average R2 value on the test data sets. For the SFO IRL model, we used γ =0.2, even though a better fit is achieved at γ =0.0. We selected this γ for SFO because using γ =0.0 would make the IRL model simply a complicated BC model, and we want to compare an IRL model to a BC model. Furthermore, the average R2 at γ =0.2 is close to that at γ =0.0, and well within the one-standard-deviation range, so we are not sacrificing much predictive performance.

4.4.2 Prediction Quality Results

The prediction quality metrics for GDP Implemented models are computed based on three confusion matrices that can be constructed with the predictions for the testing data. The three matrices investigated here are constructed with the data in the ten test data folds. Each hour-long time step in the full data set is in a testing data fold exactly once, so the testing data contains each sample from the full data set exactly once. The first confusion matrix describes predictions of GDP implementation, the second describes predictions of GDP initialization, and the third describes predictions of GDP cancellation. The first matrix involves all the testing data points, the second and third matrices are based on only some of the testing data points,andeachdata point is involved in either the second or the third matrix but not both. For each confusion matrix, the accuracy, precision, recall, and F1-score are reported. Each of these metrics will be in the range [0, 1], with larger values indicating better predictive performance. Since precision and recall are particularly relevant for imbalanced data sets, such as those faced by the initialization and cancellation models, and since the

F1-score is the harmonic mean of these two metrics, we view the F1-score as the most important single metric that quantifies the predictive performance for each confusion matrix. CHAPTER 4. GDP IMPLEMENTATION MODELS 118

Baseline: Quality of GDP Plan Model Predictions

We examined the quality of predictions produced by a baseline model that simply predicts that the previous GDP plan will be executed. If no GDP is planned for the hour in question, the model predicts that no GDP will be initialized, and if a GDP is planned, then the model predicts that it will continue and not be canceled.

The F1-scores achieved by this model are 0.90 and 0.87 for GDP implementation at EWR and SFO, respectively, which is suggestive of high-quality predictions. How- ever, this model achieved a recall of 0.00 and undefined precision and F1-score metrics for predictions of initializations and cancellations at both airports. Initialization and cancellation events are important operationally, so this model’s failure to predict these events reveals its limited operational value. Furthermore, this model’s perfor- mance illustrates the importance of evaluating models based on their predictions of initializations and cancellations, not just their predictions of GDP implementation.

Quality of BC GDP Implemented Model Predictions

Tables 4.1 and 4.2 show the three confusion matrices and related metrics achieved by the BC GDP Implemented models for EWR and SFO, respectively, when they were presented with the ten test data folds. For both EWR and SFO,theaccuracy, precision, recall, and F1-score of the overall GDP Implemented models are relatively close to one. However, this strong overall performance masks howmuchtheGDP Initialization and GDP Cancellation models struggle to predict the relatively infre- quent initialization and cancellation events. Although the F1-scores for predictions of GDP initialization and cancellation are slightly higher for SFO than for EWR, they all range between 0.41 and 0.64. For EWR, and to a lesser extent for SFO, the low precision scores achieved by the GDP Initialization models indicate a high false alarm rate—they often predict that GDPs will be initialized when they are not. The GDP initialization data set is particularly imbalanced, with 34 and 26 times more non-initialization events than initialization events for EWR and SFO, respectively. This makes predicting initializations difficult. The GDP Cancellation models also Table 4.1: Confusion Matrices for EWR BC GDP Implemented Model

(a) Implementation: accuracy=0.94, precision=0.84, recall=0.92, F1-score=0.88 Actual Predicted No GDP Predicted GDP Total No GDP 6810 348 7158 GDP 154 1788 1942 Total 6964 2136 9100

(b) Initialization: accuracy=0.95, precision=0.31, recall=0.60, F1-score=0.41 Actual Predicted No Initialization Predicted Initialization Total No Initialization 6720 273 6993 Initialization 84 124 208 Total 6804 397 7201

(c) Cancellation: accuracy=0.92, precision=0.56, recall=0.55, F1-score=0.55 Actual Predicted Continuation Predicted Cancellation Total Continuation 1664 70 1734 Cancellation 75 90 165 Total 1739 160 1899 CHAPTER 4. GDP IMPLEMENTATION MODELS 120

suffered from low precision scores, and again this is partially a result of the imbal- anced nature of the data set. There were 11 and 7 times more non-cancellation events than cancellation events for EWR and SFO, respectively.

Quality of IRL GDP Implemented Model Predictions

Tables 4.3 and 4.4 show the three confusion matrices and related metrics achieved by the IRL GDP Implemented models for EWR and SFO, respectively, when they are presented with the ten test data folds. For both EWR and SFO, the IRL GDP Implemented models demonstrate substantially lower predictive performance than the BC GDP Implemented models. This is most evident for predictions ofGDP initialization: the EWR IRL model predicts initialization much too frequently while the SFO IRL model predicts initialization too infrequently. The SFO IRL model also predicts GDP cancellation too infrequently. Although it is difficult to determine exactly what caused the relatively poor pre- dictive performance of the IRL GDP Implemented models, the results of diagnostic tests and analysis of the estimated regressor parameters (see Section 4.4.3) suggest that the poor predictive performance of the reward regressors was probably an im- portant cause [96]. The reward regressors’ poor performance may in turn be the result of either selecting a reward regressor model form that is not suited to the regression problems or using an inaccurate or incomplete set of reward features. Sev- eral reward features were derived from the state of the ground and air buffer system model, and our intuition is that the fidelity of this system model may need to be improved. Finally, the IRL and BC models are based on different assumptions about decision making, which could help explain the difference in predictive performance. In particular, GDP implementation decision making may be more of a reaction to the current state that does not explicitly consider the impact of actionsonfuturestates, as assumed by BC approaches, than an attempt to deterministically and strategically achieve rewards accrued over time, as assumed by the CSI IRL algorithm. The results of our analysis do not provide any insight into the reasons that GDP implementation decision making might exhibit this characteristic, but it could be related to training, procedures, or system uncertainties. Table 4.2: Confusion Matrices for SFO BC GDP Implemented Model

(a) Implementation: accuracy=0.95, precision=0.85, recall=0.90, F1-score=0.87 Actual Predicted No GDP Predicted GDP Total No GDP 7625 337 7962 GDP 206 1842 2048 Total 7831 2179 10010

(b) Initialization: accuracy=0.96, precision=0.50, recall=0.87, F1-score=0.64 Actual Predicted No Initialization Predicted Initialization Total No Initialization 7456 257 7713 Initialization 39 258 297 Total 7495 515 8010

(c) Cancellation: accuracy=0.88, precision=0.50, recall=0.68, F1-score=0.58 Actual Predicted Continuation Predicted Cancellation Total Continuation 1584 167 1751 Cancellation 80 169 249 Total 1664 336 2000 Table 4.3: Confusion Matrices for EWR IRL GDP Implemented Model

(a) Implementation: accuracy=0.81, precision=0.53, recall=0.92, F1-score=0.67 Actual Predicted No GDP Predicted GDP Total No GDP 5554 1604 7158 GDP 159 1783 1942 Total 5713 3387 9100

(b) Initialization: accuracy=0.78, precision=0.085, recall=0.67, F1-score=0.15 Actual Predicted No Initialization Predicted Initialization Total No Initialization 5491 1502 6993 Initialization 69 139 208 Total 5560 1641 7201

(c) Cancellation: accuracy=0.90, precision=0.41, recall=0.38, F1-score=0.40 Actual Predicted Continuation Predicted Cancellation Total Continuation 1644 90 1734 Cancellation 102 63 165 Total 1746 153 1899 Table 4.4: Confusion Matrices for SFO IRL GDP Implemented Model

(a) Implementation: accuracy=0.94, precision=0.85, recall=0.84, F1-score=0.84 Actual Predicted No GDP Predicted GDP Total No GDP 7659 303 7962 GDP 337 1711 2048 Total 7996 2014 10010

(b) Initialization: accuracy=0.95, precision=0.17, recall=0.081, F1-score=0.11 Actual Predicted No Initialization Predicted Initialization Total No Initialization 7595 118 7713 Initialization 273 24 297 Total 7868 142 8010

(c) Cancellation: accuracy=0.88, precision=0.50, recall=0.26, F1-score=0.34 Actual Predicted Continuation Predicted Cancellation Total Continuation 1687 64 1751 Cancellation 185 64 249 Total 1872 128 2000 CHAPTER 4. GDP IMPLEMENTATION MODELS 124

4.4.3 Insight Results

Insight from BC GDP Implemented Model

The main form of insight available from the random forest models used in the BC GDP Implemented model was provided by the feature importance scores. The importance score for a feature is the total decrease in node “impurity” (as measured by the Gini splitting criterion) resulting from splits defined based on the feature and weighted by the proportion of samples reaching the corresponding nodes, averaged over all the trees in the ensemble. According to this definition, important features define frequently-used splits and/or generate large improvements in the split criterion when they are used to define splits. Larger scores imply greater importance. We recorded the importance scores of the input features used by the models foreachoftheten times the models were trained. Figures 4.7 and 4.8 show the scores for the features with the ten highest impor- tance scores for initialization and cancellation models that make up theBCGDP Implemented models for EWR and SFO, respectively. The height of each bar is the mean of the importance scores for each feature according to the ten models con- structed for the ten training data sets used in cross validation, andtheerrorbars show the standard deviation of these ten importance scores. Features whose names end with a “ k” for some integer k are predictions of the feature for the hour starting k hours from the time of the prediction. For the EWR GDP Initialization model, features related to the AAR (“AAR”) or a prediction of the AAR (“Pred AAR”) made up five of these ten features. This suggests that future AAR levels have a relatively strong influence on GDP initial- ization, which makes sense given that GDPs are used when expected arrival demand exceeds expected arrival capacity and that arrival capacity is quantified by AAR. Features related to scheduled arrivals (“SCHARR”) accounted foranotherfourofthe ten features, which is also not surprising because GDPs are used to reduce arrivals when predictions suggest that there might be an excessive number of arrivals. The tenth feature in this set is a prediction of the number of departure Centers for which reroutes are required three hours in the future (“Centers Rerouted 3”), suggesting CHAPTER 4. GDP IMPLEMENTATION MODELS 125

(a) GDP Initialization model.

(b) GDP Cancellation model.

Figure 4.7: Features with highest importance scores for the EWR BC GDP Imple- mented model. CHAPTER 4. GDP IMPLEMENTATION MODELS 126

(a) GDP Initialization model.

(b) GDP Cancellation model.

Figure 4.8: Features with highest importance scores for the SFO BC GDP Imple- mented model. CHAPTER 4. GDP IMPLEMENTATION MODELS 127

that GDPs are initialized at EWR to help reduce demand for constrained airspace. This makes sense because flights bound for EWR typically traverse highly congested airspace in the northeastern US. For the EWR GDP Cancellation model, high average importance scores were achieved by five features related to parameters of the previous GDP plan, such as the planned time until the end of the GDP (“Prev GDP LATS to end”) and planned GDP rates (“Prev GDP Rate”). This suggests that GDPs are unlikely to be canceled when the previous plan indicates that they are far from finishing, which makes sense since experts, knowing that stakeholders may value predictable GDP plans, presum- ably attempt to select appropriate GDP end times [89]. The planned rates may be important because they are another way to learn the planned remaining duration, or perhaps because they may indicate the degree to which capacity is diminished (GDPs might be less likely to be canceled when capacity is lower). Features related to sched- uled arrivals (“SCHARR”) accounted for four of the remaining features achieving the ten highest importance scores. This makes sense because if scheduled arrivals are low, a GDP may no longer be needed. The final feature in this set is a reroute-related feature (“Centers Rerouted 2”), again suggesting that EWR GDPs might be partially caused by congested airspace that requires new routes for flightsboundforEWR. Of the ten features with the largest average importance scores for the SFO GDP Initialization model, five were related to observations or forecasts of the ceiling or me- teorological conditions valid at some hour in the upcoming two hours (“CEILING”, “Ceiling 0”, “Ceiling 1”, “Met Conds 0”, and “Met Conds 1”). “CEILING” is the observation of the ceiling at the current hour recorded in a METAR report. “Ceil- ing 0” is the prediction of the ceiling from a TAF forecast that is valid for the current hour. The TAF forecast may have been published very recently, or it may have been published up to three hours ago. This dependence on ceilings or meteorological con- ditions is not surprising given that SFO GDPs are largely caused by low ceilings [110]. Only one feature in this set was an AAR (“Pred AAR 1”), which is fewer than the five such features for the EWR GDP Initialization model. This might be because the weather conditions that lead to lower capacity at SFO are relatively straightforward to describe and quantify, thus enabling the SFO model to identify anddependupon CHAPTER 4. GDP IMPLEMENTATION MODELS 128

them directly, while the capacity at EWR is more a more complicated function of a variety of weather and other conditions, leading the EWR model to depend on AAR predictions rather than directly on weather conditions. Features related to sched- uled arrivals made up the remaining four features, which is not surprising for reasons described earlier. Finally, for the SFO GDP Cancellation model, four of the ten features achieving the largest average importance scores came from the previous GDP plan (such as “Prev GDP LATS to end” and planned GDP rates). This makes sense for the reasons described in our discussion of the EWR cancellation model importance scores. Five other features in this set were predicted or current AARs, and the final feature in the set was the current ceiling.

Insight from IRL GDP Implemented Model

Although our evaluation suggests that the predictive power of the IRL GDP Imple- mented model is low, the estimated parameters θˆ for the reward function regressor & RˆC(s, a)=θˆ f(s, a) may still provide some useful insights. These insights should

be viewed with suspicion, however, because RˆC struggled to fit the reward sample training data. In the ten testing data sets used in cross validation ofthereward 2 regressor, the average R value achieved by RˆC was only 0.30 for EWR and 0.21 for

SFO (see Section 4.4.1). These low values suggest that RˆC was not explaining much of the variation of the reward samples in the training data set. We suspect that this poor performance was an important cause of the poor predictive power of the IRL GDP Implemented model. Table 4.5 shows average reward parameter estimates and corresponding average p-values for the ten regressors trained with the ten training data folds used in cross

validation. Before training RˆC, scalar-valued reward features were standardized over the time steps for which they are defined, while the lone indicator feature (which takes a value of one when the air buffer at the end of the time step is greaterthanorequal to five, and zero otherwise) was not standardized. This facilitates interpretation of the parameter estimates by making comparisons of their relative magnitudes more

meaningful. When constructing reward samples to train RˆC,weusedγ =0.4for CHAPTER 4. GDP IMPLEMENTATION MODELS 129

EWR and γ =0.2 for SFO. Therefore, the ranges of possible and typical reward sample values differ between the two airports, so we cannot directly compare the values of the estimated parameters between airports.

Table 4.5: Properties of Parameters of RˆC for EWR and SFO EWR SFO

Reward Feature (fm(s, a)) θˆm p-value θˆm p-value constant 0.53 < 0.001 0.70 < 0.001 ground buffer at end of time step 0.0040 0.14 −0.022 < 0.001 change in ground buffer 0.0078 0.14 0.0070 0.19 air buffer at end of time step 0.015 < 0.001 −0.0035 0.40 change in air buffer −0.035 < 0.001 −0.027 < 0.001 air buffer at end of time step ≥ 5 −0.24 < 0.001 −0.19 < 0.001 arrivals −0.029 < 0.001 0.0056 0.076 unused slots during rate control −0.15 < 0.001 −0.21 < 0.001 duration of GDP canceled −0.041 0.040 −0.050 0.039

The reward regressors for the two airports are remarkably similar. This similarity is consistent with the conjecture that while different phenomena cause congestion issues leading to GDPs at these two airports (as evidenced by the different impor- tant features in the two BC GDP Initialization models described in Section 4.4.3), GDPs are implemented to achieve roughly the same objectives at both airports. Fur- thermore, this similarity illustrates the potential of IRL algorithms to identify reward functions and corresponding policies that generalize—working in a variety of contexts, including those not represented in training data. More specifically, if non-constant reward features that achieve p-values less than 0.05 are sorted from largest to smallest magnitude of the average corresponding parameter estimate, theorderofthetopfour features for EWR is as follows: the indicator that the air buffer at the end of the time step is greater than or equal to five, the number of unused slots during rate control, the duration of GDP canceled, and the change in the air buffer. The order for SFO is nearly identical (only the top two features are swapped). These parameter estimates are all negative, as would be expected. The estimates quantify the balance achieved CHAPTER 4. GDP IMPLEMENTATION MODELS 130

by traffic managers as they face a fundamental trade-offin GDP implementation: air- borne delay is expensive, and implementing GDPs can help reduce it, but excessive GDP implementation can lead to undesired under-utilization of available capacity. The objective functions used in various algorithms [15, 116], including one in an op- erational GDP decision-support tool [114], all specify some balance for this trade-off. However, as far as we know, this is the first time that the balance achieved in current operations has been inferred directly from historical traffic flow management initia- tives and related data. Furthermore, the estimate for the parameter corresponding to the duration of GDP canceled feature provides a quantification of the relative im- portance of continuing an existing GDP plan, which is related to the predictability metric identified by Liu and Hansen in [89]. If we investigate the average parameter estimates for less important features (namely, those that achieve relatively high p-values and/or average parameter es- timates with small magnitudes), then we find some differences between the regressors for the two airports. For example, while the average parameter estimates for buffer levels are negative for the regressors for SFO, as would be expected, they are posi- tive for EWR. These counter-intuitive parameter estimates may help explain why the EWR IRL GDP Implemented model over-predicted GDP initialization.

4.5 Conclusions

GDPs seem to be a tool for strategically managing traffic in an effort to achieve desired values for certain metrics that are accrued over time, suggesting that IRL may be a promising technique for GDP analytics. Therefore, we compared IRL models of GDP implementation to BC models. More precisely, we developed BC models of GDP implementation that are based on random forest models of GDP initialization and GDP cancellation. We used the CSI IRL algorithm to inferreward functions consistent with historical state and action data and then used rollouts to find policies attempting to optimize expected total discounted reward objectives based on the inferred reward functions. Furthermore, we implemented BC models for GDP parameters that are used by the rollouts policies. The models were developed for CHAPTER 4. GDP IMPLEMENTATION MODELS 131

EWR and SFO and evaluated using cross validation on a data set consisting of 455 days from the summers of 2011–2013. When predicting GDP implementation on testing data, the BC GDP Implemented models we developed for EWR and SFO demonstrate substantially stronger predictive performance than the IRL GDP Implemented models we developed. The relatively poor performance of the IRL models may be caused by inaccuracies in the simple ground and air buffer model, an incomplete set of reward features, an ill-suited reward regressor form, or the invalidity of some of the underlying assumptions made by the CSI IRL algorithm, such as deterministic decision making that seeks to strategically achieve rewards accrued over time. Our experiments also suggest that neither the BC nor the IRL models predict the relatively infrequent GDP initialization or cancellation events well. We also investigated the structure of the models in order to gain insights into GDP implementation decision making. Features related to predictions of conditions more than four hours in the future do not improve the predictive power oftheBCGDP Implemented models. Similarly, we selected low discount factors of 0.4forEWR and 0.2 for SFO for use in the IRL algorithm because these values led to reward training data that the reward regressors were better able to fit. These characteristics of the BC and IRL models are inferred from historical data and suggest that GDP implementation decisions are more tactical than strategic: they are made primarily based on current conditions or conditions anticipated in only the next couple of hours. Feature importance scores derived from the random forest BC GDP Initialization and GDP Cancellation models suggest that the set of the most important features varies between airports. Features related to scheduled arrivals, predicted airport arrival capacity levels, the previous GDP plan, certain weather conditions, and reroutes are most important for one or both airports. The reward functions inferred by the IRL algorithm are not able to achieve a good fit of the training data, but their structures suggest that decision makers at both airports are primarily concerned with avoiding relatively large numbers of flights that must incur delay in the air, avoiding unused arrival slots while delaying flights on the ground to achieve a certain rate of arrivals at the airport, and avoiding canceling a GDP long before its planned end time. Chapter 5

Conclusions and Future Work

The complex ATM system is largely controlled by human decision makers,sowe set out to improve this important system by enabling the development of tools that can assist these decision makers. Such tools are a first step towards an increas- ingly autonomous ATM system that provides greater value to stakeholders by taking advantage of the strengths of both computer-based systems and humans [6]. In par- ticular, we enabled tools by developing decision models and corresponding solution algorithms. These can enable decision support tool development by generating pro- posed decisions, providing insights into decision making, and by predicting decisions. Our approach to building these models and algorithms leveraged expert input and feedback, operational data analytics, fast-time simulations, and human-in-the-loop simulations. We utilized and extended techniques from optimization, dynamic pro- gramming, and machine learning to develop solution algorithms and to make infer- ences about decisions based on operational data.

5.1 Contributions

To enable the development of a decision support tool for area supervisors, we de- veloped a prescriptive decision model for finding a good schedule of area configu- rations. The model was developed based on interactions with experts, operational data analytics, and fast-time simulations. Some possible inputs to this model, such

132 CHAPTER 5. CONCLUSIONS AND FUTURE WORK 133

as characteristics of individual controllers, were not included in the model due to challenges related to quantifying these characteristics. Furthermore, it is difficult to model the relationship between area configurations and safe and efficient traffic op- erations. Therefore, we developed and evaluated novel algorithmsthatproposeaset of near-optimal and meaningfully-different configuration schedule advisories. One of these algorithms was incorporated into the OASIS decision-support tool, which was evaluated in a human-in-the-loop experiment. This experiment validated the model and algorithm: the average acceptability of selected algorithm-generated advisories was more than four out of five, and user modifications to selected advisories were minor and led to no significant improvement in acceptability. Airline delay cost models are essential to air traffic flow management research aimed a decision-support tool development, so it is no surprise that researchers have proposed many such models. We used operational data describing airline decisions in AFPs to evaluate these models. We developed and validated a novel heuristic that finds cost models and corresponding noise parameters that maximize an approx- imation of the likelihood of the recorded airline decision data. This novel heuristic produced new insights into airline delay costs, and we anticipate that these will prove valuable in the development of decision-support tools related to TFM. GDPs are a powerful tool for TFM, so we built models for predicting and un- derstanding GDP implementation decision making. The strategic nature of GDPs and the existence of metrics that seem to direct their implementation motivated us to develop an inverse reinforcement learning model of GDP implementation. To the best of our knowledge, this model is the first inverse reinforcement learning model of an ATM decision. We found that behavioral cloning models produce higher quality predictions of GDP implementation, although both types of model struggle to predict relatively infrequent GDP initialization and cancellation events. Predictive models of this type could be useful in decision-support tools for airlines managing flights affected by GDPs. The structure of the models provided novel insights into GDP implementation decision making. For example, properties of both types of model suggest that GDP implementation decisions are more tactical than strategic: they are primarily made based on conditions now or conditions anticipated in the next CHAPTER 5. CONCLUSIONS AND FUTURE WORK 134

couple of hours. Furthermore, the reward functions inferred by inverse reinforcement learning models yield new insights into the metrics that guide GDP implementation decisions. These insights could provide motivation or direction for those seeking to build tools to support GDP decision makers [47].

5.2 Future Work

All of the models we developed could benefit from access to a richer set of features describing the current situation. For example, delay cost models could be improved if they more explicitly accounted for the airline’s daily schedule. The OASIS decision- support tool might be more useful if it had access to detailed daily staffplans, which could indicate with relative precision how many controllers would be available at various times. Richer weather and weather forecast data, higher-resolution flight data, and information about airline preferences would all be beneficial for our GDP implementation models. The models we developed fail to explicitly account for uncertainties that impact decisions. For example, the model and algorithms for finding area configuration sched- ule advisories do not account for uncertainties in predictions of future traffic, nor do they consider uncertain future changes in airspace capacity due to weather condi- tions. The slot utilization decisions faced by airlines during AFPs are impacted by uncertainties such as when the program will end or if the program rate will change. Uncertainties in future weather conditions and airline responses to GDPs (such as flight cancellations) can impact GDP implementation decisions. Explicitly handling these uncertainties would lead to more realistic and helpful descriptive and prescrip- tive decision models. The research in this dissertation is aimed at enabling decision-support tools to improve the ATM system, and the development of these tools will require additional work. Our model and algorithm were incorporated into the OASIS tool, which shows promise, but further work is needed before that tool is deployed operationally. Tools to support decisions related to GDPs have been deployed and are being researched [47, 89,114]; our hope is that our work will facilitate these efforts. In general, these tools CHAPTER 5. CONCLUSIONS AND FUTURE WORK 135

should keep humans engaged and leverage their strengths, as well as the strengths of data analytics, optimization, and computation. Although we did not test this hypothesis, it may be that the technique used in the OASIS tool, by requiring a human decision maker to select from among a set of good and distinct optionscomputedby an algorithm, appropriately leverages the strengths of humans and computation. This may be a promising technique for other decision-support tools. Of course we have only researched a few of the many decisions involved in ATM, so the approach we have utilized to model decisions and develop solution algorithms could be deployed to enable tools to support other ATM decisions. Finally, some of the ATM decisions currently made by humans could be made by systems that adapt and learn from their successes and mistakes. Our approach for developing decision models and solution algorithms to enable decision-support tools could also provide a foundation for increasingly autonomous systemsthatcanmake good decisions on their own. Human decision makers have made the ATMsystemone of the technological wonders of the modern world [66], and our approach could give increasingly autonomous systems the benefit of learning from the skill and experience of these remarkable humans. Appendix A

Forward A∗ Algorithm

The Forward A∗ algorithm (ForwardA∗) is specified in Algorithm 4.

Algorithm 4 ForwardA∗(C,T) K Require: C = {Ck}k=0 {Valid configuration schedule advisories} Require: T {Traffic situation data} closed ←∅ open ← priority queue containing C0 with key 0 Ck ← minimum-key configuration in open while Ck ∈/ CK do Add Ck to closed for Ck+1 ∈Ck+1 do J ← J k(Ck)+gk+1(Ck,Tk,Ck+1,Tk+1) if Ck+1 ∈ open and J

136 Appendix B

The NP-Completeness of M-ε-d-CSAs

The M-ε-d-CSAs problem is NP-complete. This will be shown by first demonstrating that the Independent Set (IS) problem, which is NP-complete and even difficult to approximate [123], is polynomial-time reducible to a decision version of M-ε-d-CSAs. This is sufficient to show that the M-ε-d-CSAs problem is NP-hard [83]. First, a review of the IS problem will be provided. Given a graph G =(V,E), a set of nodes are independent if none of the nodes are joined by an edge in G.The decision version of the IS problem is to report whether it is possible to find a set of nodes U ⊂ V of size |U| = M such that all of the M nodes are independent on G. Now IS will be shown to be polynomial-time reducible to a decision version of M-

ε-d-CSAs. This will be done by demonstrating, for an arbitrary instance IIS of the IS problem, how to construct, in polynomial time, an instance IM-ε-d-CSAs of M-ε-d-CSAs such that

1. if IIS is a “yes” instance of IS (that is, an independent set of the required size

can be found), then there exists a feasible solution for IM-ε-d-CSAs,and

2. if there exists a feasible solution for IM-ε-d-CSAs,thenIIS is a “yes” instance of IS.

Given an arbitrary IS instance IIS on a graph G(V,E), the corresponding decision

137 APPENDIX B. THE NP-COMPLETENESS OF M-ε-D-CSAS 138

instance IM-ε-d-CSAs is constructed in polynomial time as follows. The number of nodes

M required in IIS maps to the number of solutions M required in IM-ε-d-CSAs.The cost function in IM-ε-d-CSAs is defined to be equal to zero (gk(·, ·, ·, ·) ! 0) so that

IM-ε-d-CSAs becomes a feasibility or decision problem. Then, a time-expanded graph is constructed for use in the IM-ε-d-CSAs problem instance. For this graph, K = 1. This graph consists of a single starting node (a required starting configuration C0 that makes up C0). For k =1,anyofthenodesV represent valid configurations (C1 ! V ). This means that any of the required M valid configuration schedule advisories will m select a single node from V for C1 . The configuration difference metric φ is defined as follows:

$ 1 if Ck and C do not share an edge in G(V,E) φ(C ,C$ )= k k k  0 else.

The configuration difference metric is defined such that two configuration schedule advisories achieve an advisory difference of 1 if and only if the configurations used by the advisories at k = 1 (which correspond to nodes from V ) do not share an edge in the IIS graph G(V,E). Finally, the minimum required difference d for IM-ε-d-CSAs will be defined as d ! 1.

Next, the first condition required of IIS and IM-ε-d-CSAs will be verified. Suppose that the arbitrary IIS is a “yes” instance of IS: there exists a set of M nodes U ⊂ V such that each node in U is independent from all of the other nodes in U.The set U can be converted into M configuration schedule advisories that make up a th feasible solution for IM-ε-d-CSAs.Them configuration schedule advisory is defined m th with the required C0 and with C1 equal to the m node from U. Since U is an independent set, each node in U shares no edge in E with any other node in U. This means that, due to the definition of the configuration difference metric for IM-ε-d-CSAs, the M configuration schedule advisories defined as described will meet the advisory difference constraint (2.16). These M configuration schedule advisories are feasible for IM-ε-d-CSAs, and the condition has been verified. APPENDIX B. THE NP-COMPLETENESS OF M-ε-D-CSAS 139

Finally, the second condition required of IIS and IM-ε-d-CSAs will be verified. Sup- pose that there exists a feasible solution for IM-ε-d-CSAs. Let this feasible solution be 1 M m m m m {C ,...,C },whereeachC is a configuration schedule advisory: C = {C0 ,C1 } for m =1,...,M. A “yes” instance of IIS can be constructed as follows. Let m M U = {C1 }m=1 ⊂ V be a set of size M that we will use to demonstrate that there 1 M exists an independent set of size M for IIS. Since {C ,...,C } make up a feasible solution for IM-ε-d-CSAs, we know that they satisfy the advisory difference constraint. Due to how the configuration difference metric is defined, this means that each of the m m! m M C1 do not share an edge in E with any other C1 . This means that U = {C1 }m=1 is an independent set of size M on G(V,E), verifying that IIS is indeed a “yes” instance of IS. Furthermore, it is trivial to show that a solution to the decision version of M- ε-d-CSAs can be efficiently certified. Therefore, it is not only NP-hard but also NP-complete. Appendix C

Reverse Value Iteration Algorithm

The Reverse Value Iteration algorithm (ReverseVI) is described in Algorithm 5.

Algorithm 5 ReverseVI(C,T) K Require: C = {Ck}k=0 {Valid configuration schedule advisories} Require: T {Traffic situation data} for CK ∈CK do ¯% JK(CK) ← 0 for k = K − 1,...,0 do for Ck ∈Ck do ¯% ¯% Jk (Ck) ← minCk+1∈Ck+1 gk+1(Ck,Tk,Ck+1,Tk+1)+Jk+1(Ck+1) % % C¯ (Ck) ∈ argmin gk (Ck,Tk,Ck ,Tk )+J¯ (Ck ) k+1 Ck+1∈#Ck+1 +1 +1 +1 k+1 $ +1 return {J¯%(C )}K−1, {C¯% (C )}K−1 k k k=0 k+1 k # k=0 $

140 Appendix D

Recursive Value Iteration Fraction Optimal Algorithm

The Recursive Value Iteration Fraction Optimal algorithm (RecursiveVIFO) is de- scribed in Algorithm 6. It is a recursive implementation of the algorithm proposed by Byers and Waterman [40].

¯% K−1 Algorithm 6 RecursiveVIFO(C, {Jk (Ck)}k=0 ,Ck,Jmax,J) Function K Require: C = {Ck}k=0 {Valid configuration schedule advisories} ¯% K−1 Require: {Jk (Ck)}k=0 {Minimum cost-to-go values} Require: Ck ∈Ck {Last configuration in partial advisory under consideration} Require: Jmax ∈ R+ {Upper bound on advisory cost} Require: J ∈ R+ {Cost incurred so far by the partial advisory under consideration} Cε ←∅ for Ck+1 ∈Ck+1 do ¯% if J + gk+1(Ck,Tk,Ck+1,Tk+1)+Jk+1(Ck+1) ≤ Jmax then if k +1=K then # ε $ Add partial advisory {Ck,CK} to C else ε,Ck+1 ¯% K−1 C ← RecursiveVIFO(C, {Jk (Ck)}k=0 ,Ck+1,Jmax,J + gk+1(Ck,Tk,Ck+1,Tk+1)) for each partial advisory C¯ ∈Cε,Ck+1 do $ C¯ ←{Ck, C¯}{Prepend Ck to the partial advisory C¯} Add C¯$ to Cε return Cε

141 Appendix E

Properties of FBVISAS

Although more restrictive than is necessary, we will define and utilize simple problem instances.

Definition 1 (simple M-ε-d-CSAs problem instance) AsimpleM-ε-d-CSAs prob- lem instance is one in which

• M =2,

• ε = ∞,

• the configuration constraints are such that C1 = C2 = ···= CK and that C0inC1,

% • and there exists a unique optimal first advisory (|CCSA(C,T)| =1).

E.1 Simple Reconfiguration Cost-Dominated Prob- lem Instances

Definition 2 (reconfiguration cost-dominated M-ε-d-CSAs problem instance) Reconfiguration cost-dominated M-ε-d-CSAs problem instances are those in which βR is large enough that

K R R $ S S $ min min β g (C, Tk−1,C ,Tk) > max [g (C, Tk) − g (C ,Tk)]. ! ! k ! k k k∈{1,...,K} {C,C ∈Ck|C%=C } C,C ∈Ck k !=1

142 APPENDIX E. PROPERTIES OF FBVISAS 143

The left hand side of this inequality is the lowest cost of changing configurations during the entire time horizon. The right hand side is the sum over all the configuration time steps of the largest difference between static costs at each time step. An important characteristic of instances that are both simple and reconfiguration cost-dominated is that advisories never achieve lower total costs by changing configurations—it is always better to remain in the current configuration.

Lemma 1 The single optimal first advisory for simple reconfiguration cost-dominated problem instances uses the initial configuration for all timestepsk ∈{0, 1,...,K}.

Proof of Lemma 1 The definition of simple instances ensures that such an advisory meets the configuration constraints. The definition of reconfiguration cost-dominated instances ensures that lower advisory costs can always be achieved by staying in the same configuration rather than by changing configurations. Therefore, using the ini- tial configuration for the entire time horizon is the unique minimum-cost advisory.

Proposition 1 If one exists, FBVISAS finds an optimal second advisory for simple reconfiguration cost-dominated problem instances.

Proof of Proposition 1 By Lemma 1, we know that the unique optimal first ad-

visory uses the initial configuration C0 = C for the entire problem instance time horizon. The M-ε-d-CSAs problem statement requires that an optimal second ad- visory use a different airspace configuration for at least d time steps. To meet this constraint, an optimal second advisory must reconfigure at some time step k to a new configuration C$ such that CA =' C$,A. We will investigate how FBVISAS handles this configuration C$ at time step k to show by contradiction that it must return an optimal second advisory for these instances. Before we investigate the operation of FBVISAS on these instances, we will es- tablish some properties of an optimal second advisory. In particular, we will study two parts of an optimal second advisory: the partial optimal second advisory from C$ in time step k to a final configuration in time step K and the partial optimal second advisory from C in time step 0 to C$ in time step k. APPENDIX E. PROPERTIES OF FBVISAS 144

• Partial optimal second advisory from C$ at time step k: Consider the partial optimal second advisory from C$ at time step k to the end of the time horizon at time step K. For the CSA problem instance corresponding to the M-ε-d- CSAs problem instance, the unique cost-minimizing partial advisory from C$ at time step k until the final time step K will remain at C$ because this is a simple reconfiguration cost-dominated instance. We will show that an M-ε- d-CSAs-optimal second advisory also remains at C$ for the remainder of the time period. The change in configuration at time step k in this optimal second advisory must have occurred early enough to fulfill the difference constraint in the time steps from k to K. Since CA =' C$,A, a partial advisory that remains at C$ will therefore also achieve the difference constraint. Such a partial advisory also minimizes the M-ε-d-CSAs objective, so it is the portion of the optimal second advisory after time step k. Overall, we note that an M-ε-d-CSAs-optimal second advisory uses the unique CSA-minimal cost-to-go partial advisory from C$ at time step k, and this partial advisory achieves the minimum required difference from the optimal first advisory.

• Partial optimal second advisory to C$ at time step k: Now consider the part of this optimal second advisory from C at time step 0 to C$ at time step k. For the CSA problem corresponding to the M-ε-d-CSAs problem instance, the partial advisory that stays at C from time step 0 until time step k − 1and then changes to C$ at time step k must be cost-minimizing among all partial advisories starting in C and ending in C$ at time step k. Otherwise, a lower-cost second advisory for the M-ε-d-CSAs problem instance could be constructed by using this hypothetical other CSA cost-minimal partial advisory. Such a lower- cost second advisory would still meet the difference constraint (2.16) because that constraint is met by the portion of the advisory from time steps k to K. Overall, we note that the M-ε-d-CSAs-optimal second advisory achieves the CSA-minimal cost-so-far for partial advisories starting in C at time step 0 and ending in C$ at time step k. APPENDIX E. PROPERTIES OF FBVISAS 145

Now we leverage these two properties to show that FBVISAS will return an opti- mal second advisory. Arguing by contradiction, we assume that FBVISAS returned a second advisory with a larger total advisory cost than is achieved by an M-ε-d- CSAs-optimal second advisory. We have already shown that an M-ε-d-CSAs-optimal second advisory achieves the CSA-minimal cost-so-far from C at time step 0 to C$ at time step k and the CSA-minimal cost-to-go from C$ at time step k to the end of the time horizon. We have assumed that the cost of this optimal second advisory, which is the sum of the cost-so-far and cost-to-go, is lower than the cost of the second advisory returned by FBVISAS. FBVISAS investigates configurations at time steps starting with those that have the lowest sum of the cost-so-far and cost-to-go, so C$ at time step k would have been investigated by FBVISAS before it found and re- turned the assumed higher-cost second advisory. However, if FBVISAS investigated C$ at time step k, it would have discovered that the advisory constructed from these two partial advisories met the problem constraints because the partial advisory from C$ at time step k meets the difference constraint on its own (as was shown earlier). Therefore, it would have returned this M-ε-d-CSAs-optimal second advisory, a result that contradicts our assumption that FBVISAS returned a second advisory with a larger total advisory cost than is achieved by an M-ε-d-CSAs-optimal second advi- sory. Therefore, when one exists, FBVISAS returns an optimal second advisory for simple reconfiguration cost-dominated problem instances.

E.2 Simple Static Cost-Dominated Problem Instances with d>1

Definition 3 (static cost-dominated M-ε-d-CSAs problem instance) Static cost- dominated M-ε-d-CSAs problem instances are those in which βR =0.

An important characteristic of static cost-dominated instances is that if there is a static cost benefit in a single time step of using one configuration overanother,the reconfiguration cost required to achieve this cost benefit by reconfiguring is never prohibitive. In these instances, it is always best to use a configuration with the APPENDIX E. PROPERTIES OF FBVISAS 146

minimum static cost at each time step.

Lemma 2 The single optimal first advisory for simple static cost-dominated problem instances uses the unique configuration with the lowest static cost at each time step k ∈{1,...,K}.

Proof of Lemma 2 The definition of simple instances ensures that the optimal first advisory is unique. The definition of static cost-dominated instancesensuresthat lower advisory costs can always be achieved by reconfiguring as much as is required to achieve the lowest static cost in each next time step. Taken together, these definitions imply that there must be a unique configuration achieving the lowest static cost in each time step, and the unique optimal first advisory uses this configuration at each time step.

Proposition 2 If d>1,FBVISASdoesnotreturnasecondadvisoryforsimple static cost-dominated problem instances.

Proof of Proposition 2 Consider partial advisories to any configuration C at any time step k from the initial configuration at k = 0. For simple static cost-dominated instances, the CSA-minimum cost partial advisory will use the unique static cost- minimizing configuration from time steps 1 to k − 1. Consider partial advisories from any configuration C at any time step k to time step K. For simple static cost-dominated instances, the CSA-minimum cost partial advisory will use the unique static cost-minimizing configuration from time steps k+1 to K. Two such corresponding partial advisories are combined into a single advisory that is evaluated by FBVISAS when it investigates any configuration C at any time step k. Such investigated advisories never achieve a difference from the optimal first advisory of more than 1 because they only diverge from the optimal advisory at time step k. Therefore, the potential second advisories investigated by FBVISAS all fail to meet the difference constraint (2.16) and so no second advisory is returned. Appendix F

Forward Distinct A∗ Algorithm

The Forward Distinct A∗ algorithm (FDA∗) is specified in Algorithm 7. This algo- rithm specification uses φmax, which is the maximum possible configuration difference:

$ φ =max max φ(Ck,C ). max ! k k∈{1,2,...,K} Ck,Ck∈Ck

For the configuration difference metric (2.18) used in Chapter 2, φmax =1.

147 APPENDIX F. FORWARD DISTINCT A∗ ALGORITHM 148

Algorithm 7 FDA∗(C,T,J%,C1,λ) K Require: C = {Ck}k=0 {Valid configuration schedule advisories} Require: T {Traffic situation data} Require: J % {Minimum cost for corresponding CSA problem instance} Require: C1 {First advisory for M-ε-d-CSAs problem instance} Require: λ {Algorithm parameter} closed ←∅ open ← priority queue containing C0 with key 0 Ck ← minimum-key configuration in open while Ck ∈/ CK do Add Ck to closed for Ck+1 ∈Ck+1 do J ← J k(Ck)+gk+1(Ck,Tk,Ck+1,Tk+1) 1 1 1 P ← P k(Ck)+φ(Ck+1,Ck+1) 1 R ← J + λ((k +1)φmax − P ) if Ck+1 ∈ open and R ≤ R(Ck+1) then Remove Ck+1 from open if Ck+1 ∈/ open then Rk+1(Ck+1) ← R J k+1(Ck+1) ← J 1 1 P k+1(Ck+1) ← P † Ck(Ck+1) ← Ck ˆ R(Ck+1) ← Rk+1(Ck+1)+Jk+1(Ck+1) Add Ck+1 to open with key R(Ck+1) Ck ← minimum-key configuration in open 2 † Construct C from Ck by iteratively using Ck−1(Ck) return C2 Appendix G

Properties of FDA∗

Definition 4 (d-second-CSA problem) Consider M-ε-d-CSAs instances in which % M =2, ε = ∞,andthereisauniqueoptimalfirstadvisory(|CCSA(C,T)| =1).For such instances, finding a second advisory involves solving

minimize g(C2,T) 2 subject to Ck ∈Ck,k=0, 1, 2,...,K Φ(C1,C2) ≥ d,

which will be referred to as the d-second-CSA problem.

Lemma 3 If λ ≥ 0 and Jˆk(Ck) is an underestimate of the remaining cost

K

gk! (Ck!−1,Tk!−1,Ck! ,Tk! ) k! k != +1 ∗ of a partial advisory starting from Ck,thentheadvisoryreturnedbyFDA minimizes the Lagrangian of the d-second-CSA problem.

Proof of Lemma 3 The FDA∗ algorithm is the forward A∗ algorithm, but with an

advisory rank function Rk used instead of the cost J k. The advisory rank function can be computed as a sum of the configuration rank rk over the configurations in an

149 APPENDIX G. PROPERTIES OF FDA∗ 150

advisory or partial advisory. This configuration rank function involves both the cost 1 gk as well as the configuration difference between Ck and Ck :

k k 1 k 1 Rk({Ck! }k!=0, {Ck! }k!=0,T,λ)= rk! (Ck!−1,Ck! ,Ck! ,Tk−1,Tk,λ) k!=1 !k 1 = gk! (Ck!−1,Tk!−1,Ck! ,Tk! )+λ(φmax − φ(Ck! ,Ck! )) . k!=1 ! # $ The contribution to the rank from each configuration in the advisory is nonnegative ∗ when λ ≥ 0becausegk(·, ·, ·, ·) ≥ 0andbecauseφmax ≥ φ(·, ·). Since FDA is simply the A∗ algorithm with a nonnegative configuration rank as the configuration cost, if FDA∗ uses an underestimate of the rank that will be incurred by the remainder of an advisory starting at Ck+1 when computing R(Ck+1), then it will return a second advi- sory that minimizes the sum of the configuration rank. This is indeed the case when 2 Jˆk(Ck) is an underestimate, as assumed. Let C˜ be the advisory rank-minimizing advisory returned by FDA∗. Next, consider a modified configuration rank function

d r$ (C ,C ,C1,T ,T ,λ)=r (C ,C ,C1,T ,T ,λ) − λφ + λ . k k−1 k k k−1 k k k−1 k k k−1 k max K

Since the additional terms are both constants, C˜2 minimizes the sum of the modified rank as well. An A∗-based algorithm cannot be used directly to minimize this modified rank because it can be negative, which is why FDA∗ minimizes the unmodified rank, which is nonnegative. We know that C˜2 will minimize the both the sum of the unmodified rank and the sum of the modified rank because each advisory has the same number of configurations. The sum of the modified rank of the configurations APPENDIX G. PROPERTIES OF FDA∗ 151

in an advisory is

K K d r$ (C ,C ,C1,T ,T ,λ)= r (C ,C ,C1,T ,T ,λ) − λφ + λ k k−1 k k k−1 k k k−1 k k k−1 k max K k=1 k=1 8 9 ! !K 1 = gk(Ck−1,Tk−1,Ck,Tk)+λ(φmax − φ(Ck,Ck )) k !=1 8 d −λφ + λ max K 9 K d = g (C ,T ,C ,T ) − λφ(C ,C1)+λ k k−1 k−1 k k k k K k !=1 8 9 = g(C2,T) − λΦ(C2,C1)+λd = g(C2,T)+λ(d − Φ(C1,C2)). (G.1)

For any C2 ∈Cand λ ≥ 0, we can specify the Lagrangian L(C2,λ)forthe d-second-CSA problem as

L(C2,λ)=g(C2,T)+λ(d − Φ(C1,C2)).

This is identical to (G.1), so FDA∗ finds a C˜2 ∈Cthat minimizes the Lagrangian of the d-second-CSA problem.

Lemma 3 means that FDA∗ implements the Lagrange dual function h(λ)= 2 minC2∈C L(C ,λ)forthed-second-CSA problem. By weak duality we know that h(λ) provides a lower bound on the minimum cost for the primal d-second-CSAs problem for any λ ≥ 0.

This Lemma is possible because the assumptions that |CCSA(C,T)| =1andM = 2 eliminate the combinatorial nature of the M-ε-d-CSAs problem. The difference function (2.17) decomposes as a sum over the differences between the two advisories at each time step, so the A∗ algorithm still minimize the cost after the difference constraint is incorporated into it (see Appendix A). In fact, the d-second-CSAs problem is a type of constrained shortest path problem. Related algorithms also leverage Lagrange duality to solve constrained shortest path problems [42, 69]. APPENDIX G. PROPERTIES OF FDA∗ 152

Proposition 3 Suppose M =2, ε = ∞,thereisauniqueoptimalfirstadvisory % (|CCSA(C,T)| =1),andJˆk(Ck) is an underestimate of the remaining cost. If λ is optimal for the Lagrange dual problem of the d-second-CSA problem (i.e., it maximizes h(λ) over all λ ≥ 0) and strong duality is satisfied, then the FDA∗ algorithm with λ = λ% returns an advisory that achieves the same Lagrangian value as any second advisory that is optimal for the d-second-CSA problem.

Proof of Proposition 3 Let C2,% be any second advisory that is optimal for the d-second-CSA problem. Strong duality is satisfied (by assumption), which means that g(C2,%,T)=h(λ%)andthatC2,% minimizes L(C2,λ%). By Lemma 3, we know that FDA∗ also returns a second advisory that minimizes L(C2,λ). Therefore, when λ = λ%, the advisory returned by FDA∗ and C2,% must achieve the same minimum Lagrangian value.

This result can be extended to a problem like the d-second-CSA problem but with ε ∈ [0, ∞). This is referred to as the ε-d-second-CSA problem.

Definition 5 (ε-d-second-CSA problem) Consider M-ε-d-CSAs instances in which % M =2, ε ∈ [0, ∞),andthereisauniqueoptimalfirstadvisory(|CCSA(C,T)| =1). For such instances, finding a second advisory involves solving

minimize g(C2,T) 2 subject to Ck ∈Ck,k=0, 1, 2,...,K g(C2,T) − g(C1,T) ≤ ε, (G.2) g(C1,T) Φ(C1,C2) ≥ d, (G.3) which will be referred to as the ε-d-second-CSA problem.

Let λ1 be the dual variable corresponding to constraint (G.2) and λ2 be the dual variable corresponding to constraint (G.3).

Corollary 1 Suppose M =2,thereisauniqueoptimalfirstadvisory(|CCSA(C,T)| = ˆ % % % 1),andJk(Ck) is an underestimate of the remaining cost,. If λ =[λ1,λ2] is optimal APPENDIX G. PROPERTIES OF FDA∗ 153

for the Lagrange dual problem of the ε-d-second-CSA problem and strong duality is " ∗ λ2 satisfied, then the FDA algorithm with λ = " returns an advisory that achieves the 1+λ1 same Lagrangian value as any second advisory that is optimal for the ε-d-second-CSA problem.

Proof of Corollary 1 The Lagrangian for the ε-d-second-CSA problem is

2 2 2 % 1 2 L(C ,λ)=g(C ,T)+λ1(g(C ,T) − (1 + ε)J )+λ2(d − Φ(C ,C )) 2 % 1 2 =(1+λ1)g(C ,T) − λ1(1 + ε)J + λ2(d − Φ(C ,C )). (G.4)

2 The middle term is not impacted by C . Furthermore, since 1 + λ1 ≥ 0, minimizing the Lagrangian (G.4) over C2 ∈Cis the same as finding the advisory that minimizes

λ g(C2,T)+ 2 (d − Φ(C1,C2)). 1+λ1

When λ = λ2 , this is the cost function for which FDA∗ finds an optimal advisory. 1+λ1 The result can be proven by using the techniques used in the proofs of Lemma 3 and Proposition 3.

These results motivate the FDA∗ algorithm because under the right circumstances, it returns a second advisory that satisfies a condition that is necessarily satisfied by optimal second advisories. Aside from the insufficiency of this condition, there are some important qualifications to this motivation. The M-ε-d-CSAs problem instances that give rise to the d-second-CSA and ε-d-second-CSA problems are only a subset of all M-ε-d-CSAs problem instances. Also, we also have no reason to believe thatstrong duality will hold for second-CSA problem instances. For cases when strong duality does not hold, algorithms developed for constrained shortest path problems could be deployed to find lower-cost second advisories [42]. Given that we are interested in quickly finding advisories for instances in which M>2, we have not pursued such " % λ2 approaches. Finally, Proposition 3 and Corollary 1 require the value of λ or " , 1+λ1 respectively. For the d-second-CSA problem, λ% can be found because an implication of Lemma 3 is that by searching over λ ≥ 0 and repeatedly calling FDA∗,wecan APPENDIX G. PROPERTIES OF FDA∗ 154

solve the dual of the d-second-CSA problem:

maximize h(λ) subject to λ ≥ 0.

Since h(λ) is concave and being maximized over a convex set, this is a convex problem with a single real-valued variable. It could be solved with a simple bisectionsearch in which FDA∗ is called at each candidate λ value to determine the value of h(λ). Finding the optimal dual variable in this way is not pursued in this research because such repeated calls of the FDA∗ algorithm are computationally prohibitive for our application and because we are interested in problem instances for which M>2. These considerations led to the development of the FDA∗-SC algorithm. Appendix H

Forward Distinct A∗ with Shortcuts Algorithm

The Forward Distinct A∗ with Shortcuts algorithm (FDA∗-SC) is specified in Al- gorithm 8. For FDA∗-SC, the open priority queue functionality must be adjusted. The queue key is set up such that any configuration labeled as a shortcut is lower than any configuration not labeled as a shortcut. Among shortcut configurations or non-shortcut configurations, the key works as usual (lower keys given higher priority than higher keys). Therefore, if shortcut configurations are in the queue, then the minimum-key configuration is the shortcut configuration with the lowest key. m If Pˆ (Ck) is always an underestimate of the difference-to-go, then the algorithm will not incorrectly try to take a shortcut with partial advisories that will never m become distinct enough from advisory m.WedefinePˆ (Ck) ! 0 for all m and Ck m to ensure that this is an underestimate. The upper difference bound Pˇ (Ck)ensures that only configurations that can possibly achieve sufficient distinctness are added to m the open queue. For the difference metric (2.17) used in Chapter 2, Pˇ (Ck)=K − k works because at the most one unit of difference can be earned in each remaining time step. Finally, Φmax is the maximum possible advisory difference metric value.

155 APPENDIX H. FDA∗-SC ALGORITHM 156

∗ % M ¯ K−1 ¯† K−1 $ Algorithm 8 FDA -SC(C,T,J ,ε,d,C ,λ,{Jk(Ck)}k=0 , {Ck+1(Ck)}k=0 ,ε) K Require: C = {Ck}k=0, T {Constraints and traffic data from corresponding CSA instance} Require: J % {Minimum cost for corresponding CSA problem instance} Require: ε, d {Parameters in M-ε-d-CSAs problem instance specification} Require: CM {Advisories found so far for M-ε-d-CSAs problem instance} Require: λ, ε$ {Algorithm parameters} ¯ K−1 ¯† K−1 Require: {Jk(Ck)}k=0 , {Ck+1(Ck)}k=0 {Partial advisory specifications and costs} m ←|CM | +1 closed ←∅ open ← priority queue containing C0 with key 0 Ck ← minimum-key configuration in open " ! JK (CK )−J m while not (Ck ∈CK and J " ≤ ε and minm!∈1,...,m−1 P K (CK) ≥ d) and Ck not a shortcut do Add Ck to closed for Ck+1 ∈Ck+1 do J ← J k(Ck)+gk+1(Ck,Tk,Ck+1,Tk+1) for m$ =1,...,m− 1 do m! m! m! P ← P k (Ck)+φ(Ck+1,Ck+1) ˆ " ! J+Jk+1(Ck+1)−J m ˇm! if J " ≤ ε and minm!∈{1,...,m−1} P + P (Ck+1) ≥ d then m! J 1 m−1 (k+1)φmax−P ' ( R ← " + λ ! J m−1 m =1 Φmax−d+1 if Ck+1 ∈ open and R

[1] Interface control document for substitutions during Ground Delay Programs, Ground Stops, and Airspace Flow Programs. http://www.fly.faa.gov/ Products/NASDOCS/nasdocs.jsp,November2006.

[2] The return on investment of U.S. business travel. Report prepared for the U.S. Travel Association and the Destination & Travel Foundation, Oxford Eco- nomics USA, 2009.

[3] ICAO envinonmental report 2010: Aviation and climate change. Technical report, International Civil Aviation Organization, 2010.

[4] Global market forecast: Future journeys 2013 2032. Technical report, Airbus, 2013.

[5] statsmodels. http://statsmodels.sourceforge.net/,2013.

[6] Autonomy research for civil aviation: Toward a new era of flight. Prepubli- cation draft of report—subject to further editorial correction, Committee on Autonomy Research for Civil Aviation, National Research Council, June 2014.

[7] Current market outlook 2014–2033. Technical report, Boeing,2014.

[8] The economic impact of civil aviation on the U.S. economy. Technicalreport, Federal Aviation Administration, June 2014.

[9] FAA aerospace forecast fiscal years 2014–2034. Technical report, Federal Avia- tion Administration, 2014.

157 BIBLIOGRAPHY 158

[10] Pieter Abbeel and Andrew Y. Ng. Apprenticeship learning via inverse rein- forcement learning. In Proc. of International Conference on Machine Learning, Banff, Canada, 2004.

[11] Air Traffic Control System Command Center. Select items from the advisories database. http://www.fly.faa.gov/adv/advAdvisoryForm.jsp,2015.

[12] April K. Andreas and J. Cole Smith. Exact algorithms for robust k-path routing problems. In International Workshop on Global Optimization, Almer´ıa, Spain, September 2005.

[13] April K. Andreas, J. Cole Smith, and Simge K¨u¸c¨ukyavuz. Brand-and-price- and-cut algorithms for solving the reliable h-paths problem. Journal of Global Optimization,42(4),December2008.

[14] Michael Ball, Cynthia Barnhart, Martin Dresner, Mark Hansen, Kevin Neels, Amedeo Odoni, Everett Peterson, Lance Sherry, Antonio Trani, and Bo Zou. Total delay impact study. Technical report, The National Center of Excellence for Aviation Operations Research (NEXTOR), December 2010.

[15] Michael Ball, Cynthia Barnhart, George Nemhauser, and Amedeo Odoni. Air Transportation: Irregular Operations and Control, volume 14, chapter 1, pages 1–61. Elsevier, 2007.

[16] Michael Ball, Geir Dahl, and Thomas Vossen. Matchings in connection with Ground Delay Program planning. Networks, 53(3):293–306, 2009.

[17] Michael O. Ball, Robert Hoffman, Amedeo R. Odoni, and Ryan Rifkin.A stochastic integer program with dual network structure and its application to the ground-holding problem. Operations Research, 51(1):167–171, January- February 2003.

[18] Cynthia Barnhart, Douglas Fearing, and Vikrant Vaze. Modeling passenger travel and delays in the National air transportation system. Operations Re- search, 62(3):580–601, 2014. BIBLIOGRAPHY 159

[19] Roger Beatty, Rose Hsu, Lee Berry, and James Rome. Preliminary evaluation of flight delay propagation through an airline schedule. In USA/Europe Air Traffic Management Research & Development Seminar, Orlando, FL, October 1998.

[20] Richard Bellman and Robert Kalaba. On kth best policies. Journal of the SIAM, 8(4):582–588, December 1960.

[21] Dimitri P. Bertsekas. Dynamic Programming and Optimal Control, volume 1. Athena Scientific, Nashua, NH, 2005.

[22] Michael Bloem and Nicholas Bambos. Approximating the likelihood of histor- ical airline actions to evaluate airline delay cost functions. In Proc. of IEEE Conference on Decision and Control, Maui, HI, December 2012. 2c 2012 IEEE.

[23] Michael Bloem and Nicholas Bambos. An approach for finding multiple area of specialization configuration advisories. In AIAA Aviation Technology, Integra- tion, and Operations Conference, Los Angeles, CA, August 2013.

[24] Michael Bloem and Nicholas Bambos. Air traffic control area configuration advisories from near-optimal distinct paths. Journal of Aerospace Information Systems, 11(11):764–784, November 2014.

[25] Michael Bloem and Nicholas Bambos. Ground Delay Program analytics with behavioral cloning and inverse reinforcement learning. In AIAA Aviation Tech- nology, Integration, and Operations Conference, Atlanta, GA, August 2014.

[26] Michael Bloem and Nicholas Bambos. Ground Delay Program analytics with behavioral cloning and inverse reinforcement learning. Journal of Aerospace Information Systems, 12(3):299–313, March 2015.

[27] Michael Bloem, Michael Drew, Chok Fung Lai, and Karl Bilimoria. Advisory algorithm for scheduling open sectors, operating positions, and workstations. AIAA Journal of Guidance, Control, and Dynamics, 37(4):1158–1169, July– August 2014. BIBLIOGRAPHY 160

[28] Michael Bloem, Michael C. Drew, Chok Fung Lai, and Karl Bilimoria. Advi- sory algorithm for scheduling open sectors, operating positions, and worksta- tions. In AIAA Aviation Technology, Integration, and Operations Conference, Indianapolis, IN, September 2012.

[29] Michael Bloem and Pramod Gupta. Configuring airspace sectors with approx- imate dynamic programming. In Proc. of International Congress of the Aero- nautical Sciences, Nice, France, September 2010.

[30] Michael Bloem, Pramod Gupta, and Parimal Kopardekar. Algorithms for com- bining airspace sectors. Air Traffic Control Quarterly, 17(3):245–268, 2009.

[31] Michael Bloem, David Hattaway, and Nicholas Bambos. Evaluation of algo- rithms for a miles-in-trail decision support tool. In International Conference on Research in Air Transportation, Berkeley, CA, May 2012.

[32] Michael Bloem and Haiyun Huang. Evaluating delay cost functions with air- line actions in Airspace Flow Programs. In Proc. of USA/Europe Air Traffic Management Research & Development Seminar, Berlin, Germany, June 2011.

[33] Michael Bloem and Parimal Kopardekar. Combining airspace sectors for the efficient use of air traffic control resources. In AIAA Guidance, Navigation, and Control Conference and Exhibit, Honolulu, HI, August 2008.

[34] Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, Cambridge, UK, 2004.

[35] Michael Brennan. Simplified Substitutions – enhancements to substitution rules and procedures during Ground Delay Programs. In AGIFORS Airline Opera- tions Meeting, Ocho Rios, Jamaica, May 2001.

[36] Rachel Burbidge, Alan Melrose, and Andrew Watt. Potential adaptation to impacts of climate change on air traffic management. In USA/Europe Air Traffic Management Research & Development Seminar, Berlin, Germany, June 2011. BIBLIOGRAPHY 161

[37] Bureau of Transportation Statistics. TransStats. http://www.transtats.bts. gov/,February2015.

[38] Philip Butterworth-Hayes. Climate change and aviation: Forecasting the effects. Aerospace America,pages30–34,November2013.

[39] Gurkaran Buxi and Mark Hansen. Generating probabilistic capacity profiles from weather forecast: A design-of-experiment approach. In USA/Europe Air Traffic Management Research & Development Seminar, Berlin, Germany, June 2011.

[40] Thomas H. Byers and Michael S. Waterman. Determining all optimal and near-optimal solutions when solving shortest path problems by dynamic pro- gramming. Operations Research, 32(6):1381–1384, 1984.

[41] Mayte Cano, Pablo S´anchez-Escalonilla, and Manuel M. Dorado. Complexity analysis in the next generation of air traffic management system. In Proc. of AIAA/IEEE Digital Avionics Systems Conference,October2007.

[42] W. Matthew Carlyle, Johannes O. Royset, and R. Kevin Wood. Lagrangian relaxation and enumeration for solving constrained shortest-path problems. Net- works, 52(4):256–270, December 2008.

[43] Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16:321–357, January–June 2002.

[44] Thomas J. Colvin and Juan J. Alonso. Compact envelopes and SU-FARM for integrated air-and-space traffic management. In AIAA Aerospace Sciences Meeting, Kissimmee, FL, January 2015.

[45] Andrew Cook, Graham Tanner, and Stephen Anderson. Evaluating the true cost to airlines of one minute of airborne or ground delay. Final Report, Performance Review Commission, Eurocontrol, April 2004. BIBLIOGRAPHY 162

[46] Lara S. Cook and Bryan Wood. A model for determining Ground Delay Program parameters using a probabilistic forecast of stratus clearing. In USA/Europe Air Traffic Management Research & Development Seminar,Napa, CA, June 2009.

[47] Jonathan Cox and Mykel J. Kochenderfer. Optimization approaches to the sin- gle airport ground hold problem. In AIAA Guidance, Navigation, and Control Conference, Kissimmee, FL, January 2015.

[48] Jonathan Cunningham, Lara Cook, and Chris Provan. The utilization of current forecast products in a probabilistic airport capacity model. In AMS Annual Meeting, New Orleans, LA, January 2012.

[49] Department of Transportation. Beyond traffic 2045: Trends and choices. http://www.dot.gov/sites/dot.gov/files/docs/Draft_Beyond_ Traffic_Framework.pdf,January2015.

[50] Rahul Dhal, Sandip Roy, Christine Taylor, and Craig Wanke. Forecasting weather-impacted airport capacities for flow contingency management: Ad- vanced methods and integration. In AIAA Aviation Technology, Integration, and Operations Conference, Los Angeles, CA, August 2013.

[51] Michael C. Drew. A method of optimally combining sectors. In AIAA Aviation Technology, Integration and Operations Conference, Hilton Head, SC, Septem- ber 2009.

[52] S. E. Dreyfus. An appraisal of some shortest-path algorithms. Memorandum RM-5433-1-PR, United States Air Force Project RAND, September1963.

[53] David Eppstein. Finding the k shortest paths. SIAM Journal of Computing, 28(2):652–673, 1998.

[54] Hakan Ergan. Lessons learned in implementing a slot optimizer during Ground Delay Programs. In Airline Group of the International Federation of Operational Research Societies (AGIFORS) Airline Operations Meeting,2009. BIBLIOGRAPHY 163

[55] Douglas Fearing. The Case for Coordination: Equity, Efficiency and Passenger Impacts in Air Traffic Flow Management. PhD dissertation, Massachusetts Institute of Technology, September 2010.

[56] Federal Aviation Administration. Cleveland Air Route Traffic Control Center (ZOB) standard operating procedures, February 2012.

[57] Federal Aviation Administration. Order JO 7210.3X Facility operation and administration. http://www.faa.gov/air_traffic/publications/atpubs/ FAC/index.htm,February2012.

[58] Federal Aviation Administration. FAA operations & performancedata.http: //aspm.faa.gov/,October2013.

[59] L. Fu, D. Sun, and L. R. Rilett. Heuristic shortest path algorithms for trans- portation applications: State of the art. Computers & Operations Research, 33:3324–3343, 2006.

[60] Huina Gao and George Hunter. Evaluation of user gaming strategies in the future National Airspace System. In AIAA Aviation Technology, Integration, and Operations Conference,FortWorth,TX,September2010.

[61] Huina Gao, George Hunter, Frank Berardino, and Karla Hoffman. Development and evaluation of market-based traffic flow management concepts.InAIAA Aviation Technology, Integration, and Operations Conference,FortWorth,TX, September 2010.

[62] Michel X. Goemans. Lecture notes on bipartite matching. http://math.mit. edu/~goemans/18433S09/matching-notes.pdf,February2009.

[63] GRA, Incorporated. Economic values for FAA investment and regulatory deci- sions, a guide. Technical Report Contract No. DTFA 01-02-C00200, FAA Office of Aviation Policy and Plans, Washington, DC, October 2007. BIBLIOGRAPHY 164

[64] Shon Grabbe, Banavar Sridhar, and Avijit Mukherjee. Similar days in the NAS: an airport perspective. In AIAA Aviation Technology, Integration, and Operations Conference,September2013.

[65] Michael Grant and Stephen Boyd. CVX: Matlab software for disciplined convex programming. Web page and software, http://cvxr.com/cvx,January2009.

[66] Daniel Gross. Seven wonders of the modern world. http://www.slate.com/ articles/technology/the_back_end/2014/10/the_new_seven_wonders_ defining_the_top_technological_marvels_of_the_contemporary.html, December 2014.

[67] Lourdmareddy Gumireddy and Ilhan Ince. Optimization for heirarchical ob- jectives during Ground Delay Programs. In Airline Group of the International Federation of Operational Research Societies (AGIFORS) Airline Operations Meeting, Rome, Italy, 2002.

[68] Pramod Gupta, Michael Bloem, and Parimal Kopardekar. An investigation of the operational acceptability of algorithm-generated sector combinations. In AIAA Aviation Technology, Integration and Operations Conference, Hilton Head, SC, September 2009.

[69] Gabriel Y. Handler and Israel Zang. A dual algorithm for the constrained shortest path problem. Networks, 10:293–310, 1980.

[70] Peter E. Hart, Nils J. Nilsson, and Bertram Raphael. A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions of Systems Science and Cybernetics, 4(2):100–107, July 1968.

[71] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, New York, 2001.

[72] Walter Hoffman and Richard Pavley. A method for the solution of the nth best path problem. Journal of the ACM, 6(4):506–514, October 1959. BIBLIOGRAPHY 165

[73] Debra Hoitomt, David Kraay, and Boaxing Tang. Flight sequencingunder the FAA’s Simplified Substitution rules. In Airline Group of the International Federation of Operational Research Societies (AGIFORS) Airline Operations Meeting, Istanbul, Turkey, April 1999.

[74] Mark Huber. FAA budget: Agency struggles to manage resources. http: //www.ainonline.com/aviation-news/aviation-international-news/ 2013-07-01/faa-budget-agency-struggles-manage-resources, July 2013.

[75] Emilio Iasiello. Getting ahead of the threat: Aviation and cyber security. Aerospace America, pages 22–25, July–August 2013.

[76] Husni Idris, Antony Evans, Robert Vivona, Jimmy Krozel, and Karl Bilimo- ria. Field observations of interactions between traffic flow management and airline operations. In AIAA Aviation Technology, Integration and Operations Conference, Wichita, KS, September 2006.

[77] Karsten Jeschkies. SMOTE implementation for over-sampling. http://comments.gmane.org/gmane.comp.python.scikit-learn/5278, November 2012.

[78] V´ıctor M. Jim´enez and Andr´es Marzal. Computing the k shortest paths: A new algorithm and an experimental comparison. In Jeffrey Vitter and Christos Zaroliagis, editors, Algorithm Engineering, volume 1668 of Lecture Notes in Computer Science, pages 15–29. Springer Berlin / Heidelberg, 1999.

[79] Jaewoo Jung, Paul Lee, Angela Kessell, JeffHomola, and Shannon Zelinkski. Effect of dynamic sector boundary changes on air traffic controllers. In AIAA Guidance, Navigation, and Control Conference,Toronto,Canada,August2010.

[80] Abdul Qadar Kara, John Ferguson, Karla Hoffman, and Lance Sherry. Estimat- ing domestic US airline cost of delay based on European model. In International Conference on Research in Air Transportation,Budapest,Hungary,June2010. BIBLIOGRAPHY 166

[81] Amy Kim and Mark Hansen. Deconstructing delay: A non-parametric approach to analyzing delay changes in single server queuing systems. Transportation Research Part B, 58:119–133, December 2013.

[82] Edouard Klein, Bilal Piot, Matthieu Geist, and Olivier Pietquin. A cascaded supervised learning approach to inverse reinforcement learning. In Hendrik Blockeel, Kristian Kersting, Siegfried Nijssen, and Filip Zelezn´y,ˇ editors, Ma- chine Learning and Knowledge Discovery in Databases, volume 8188 of Lecture Notes in Computer Science, pages 1–16. Springer, September 2013.

[83] Jon Kleinberg and Eva´ Tardos. Algorithm Design. Addison Wesley, 2005.

[84] Mykel J. Kochenderfer. Decision Making Under Uncertainty: Theory and Ap- plications. The MIT Press, Cambridge, MA, 2015.

[85] Parimal Kopardekar, Albert Schwartz, Magyarits, and Jessica Rhodes. Airspace complexity measurement: An air traffic control simulation analysis. In USA/Europe Air Traffic Management Research & Development Seminar, Barcelona, Spain, July 2007.

[86] Deepak Kulkarni, Yao Wang, and Banavar Sridhar. Data mining forunder- standing and improving decision-making affecting Ground Delay Programs. In Proc. of AIAA/IEEE Digital Avionics Systems Conference,Syracuse,NY,Oc- tober 2013.

[87] Paul U. Lee, Richard Mogford, Wayne Bridges, Nathan Buckley,MarkEvans, Vimmy Gujral, Hwasoo Lee, Daniel Peknik, and William Preston. An evaluation of Operational Airspace Sectorization Integrated System (OASIS) advisory tool. In AIAA Aviation Technology, Integration, and Operations Conference,Los Angeles, CA, August 2013.

[88] Pei-Chen Liu. Managing Uncertainty in the Single Airport Ground Holding Problem Using Scenario-based and Scenario-free Approaches. PhD dissertation, University of California, Berkeley, CA, 2007. BIBLIOGRAPHY 167

[89] Yi Liu and Mark Hansen. Ground Delay Program decision-making using mul- tiple criteria: A single airport case. In USA/Europe Air Traffic Management Research & Development Seminar, , IL, June 2013.

[90] Ruen-Chze Loh, Sieteng Soh, and Mihai Lazarescu. An approach to find max- imal disjoint paths with reliability and delay constraints. In Proc. of IEEE In- ternational Conference on Advanced Information NetworkingandApplications, June 2009.

[91] Ruen Chze Loh, Sieteng Soh, and Mihai Lazarescu. Edge disjoint paths with minimum delay subject to reliability constraint. In Proc. of IEEE Asia-Pacific Conference on Communications, Shanghai, China, October 2009.

[92] Songjun Luo and Gang Yu. Airline schedule perturbation problem: Landing and takeoffwith nonsplittable resource for the Ground Delay Program. In Gang Yu, editor, Operations Research in the Airline Industry,chapter14,pages404–432. Springer, 1998.

[93] Paul Marjoram, John Molitor, Vincent Plagnol, and Simon Tavar´e. Markov chain Monte Carlo without likelihoods. Proceedings of the National Academy of Sciences of the United States of America, 100(26):15324–15328, December 2003.

[94] Ernesto de Queir´os Vieira Martins, Marta Margarida Braz Pascoal, and Jos´e Luis Esteves dos Santos. Deviation algorithms for ranking shortestpaths. In- ternational Journal of Foundations of Computer Science, 10(3):247–262, 1999.

[95] Avijit Mukherjee, Shon Grabbe, and Banavar Sridhar. Predicting Ground Delay Program at an airport based on meteorological conditions. In AIAA Aviation Technology, Integration, and Operations Conference, Atlanta, GA, June 2014.

[96] Andrew Ng. Advice for applying machine learning. http://cs229.stanford. edu/materials/ML-advice.pdf,2011. BIBLIOGRAPHY 168

[97] Tuan Nguyen, Minh Do, Alfonso E. Gerevini, Ivan Serina, Biplav Srivastava, and Subbarao Kambhampati. Generating diverse plans to handle unknown and partially known user preferences. Artificial Intelligence, 190:1–31, October 2012.

[98] Timothy J. Niznik. Optimizing the airline response to Ground Delay Programs. In AGIFORS Airline Operations Meeting, Ocho Rios, Jamaica, May 2001.

[99] OAG. Aircraft information. http://www.oag.com/NorthAmerica/ airlineandairport/aircraftstatistics.asp,2010.

[100] Fabian Pedregosa, Ga¨el Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Courna- peau, Matthieu Brucher, Matthieu Perrot, and Edouard´ Duchesnay. Scikit- learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, January–December 2011.

[101] Chao Peng and Hong Shen. An improved approximation algorithm for comput- ing disjoint QoS-paths. In Proc. of IEEE International Conference on Systems and Networking, April 2006.

[102] Christopher A. Provan, Lara Cook, and Jon Cunningham. A probabilistic air- port capacity model for improved Ground Delay Program planning. In Proc. of AIAA/IEEE Digital Avionics Systems Conference, Seattle, WA, October 2011.

[103] Purdue Engineering and NASA. National airspace system uncertainty reposi- tory. https://engineering.purdue.edu/nasur/,2015.

[104] Varun Ramanujam and Hamsa Balakrishnan. Estimation of maximum- likelihood discrete-choice models of the runway configuration selection process. In Proc. of American Control Conference,June2011.

[105] Varun Ramanujam and Hamsa Balakrishnan. Data-driven modeling of the airport configuration selection process. IEEE Transactions on Human-Machine Systems, (99):1–10, April 2015. BIBLIOGRAPHY 169

[106] Esa Rantanen. Development and validation of objective performance and work- load measures in air traffic control. Technical Report AHFD-04-19/FAA-04-7, Aviation Human Factors Division, Institute of Aviation, University of Illinois, September 2004.

[107] Nathan Ratliff, Brian Ziebart, Kevin Peterson, J. Andrew Bagnell, and Martial Hebert. Inverse optimal heuristic control for imitation learning. Paper 48, Carnegie Mellon University Robotics Institute, 2009.

[108] Nathan D. Ratliff, J. Andrew Bagnell, and Martin A. Zinkevich. Maximum margin planning. In Proc. of International Conference on Machine Learning, Pittsburgh, PA, 2006.

[109] Hayley J. Davison Reynolds, Rich DeLaura, Joseph C. Venuti, and Marilyn M. Wolfson. Uncertainty and decision making in air traffic management. In AIAA Aviation Technology, Integration, and Operations Conference, Los Angeles, CA, August 2013.

[110] Joseph Rios. Aggregate statistics of national traffic management initiatives. In AIAA Aviation Technology, Integration, and Operations Conference,Fort Worth, TX, October 2010.

[111] Michael J. Sammartino. Advisory circular: Airspace Flow Program. http:// www.fly.faa.gov/FAQ/Acronyms/circulars/AC90-102_AFP.pdf,May2006.

[112] Andreas Schafer and David G. Victor. The future mobility of the world popu- lation. Transportation Research Part A, 34:171–205, 2000.

[113] Hanif D. Sherali, Raymond W. Staats, and Antonio A. Trani. An airspace- planning and collaborative decision-making model: Part II – cost model, data considerations, and computations. Transportation Science, 40(2):147–164, May 2006.

[114] Lara Shisler, Christopher Provan, David A. Clark, William N. Chan,Shon Grabbe, Kenneth Venzke, Christine Riley, Dan Gilani, and Ed Corcoran. An op- erational evaluation of the Ground Delay Program parameters selection model BIBLIOGRAPHY 170

(GPSM). In USA/Europe Air Traffic Management Research & Development Seminar, Chicago, IL, June 2013.

[115] David A. Smith and Lance Sherry. Decision support tool for predicting aircraft arrival rates, Ground Delay Programs, and airport delays from weather fore- casts. In International Conference on Research in Air Transportation, Fairfax, VA, February 2008.

[116] Banavar Sridhar, Shon R. Grabbe, and Avijit Mukherjee. Modeling and opti- mization in traffic flow management. Proceedings of the IEEE,96(12),December 2008.

[117] Raymond William Staats. An Airspace Planning and Collaborative Decision Making Model Under Safety, Workload, and Equity Considerations. PhD thesis, Virginia Polytechnic Institute and State University, April 2003.

[118] J. W. Suurballe. Disjoint paths in a network. Networks, 4:125–145, 1974.

[119] J. W. Suurballe and R. E. Tarjan. A quick method for finding shortest pairs of disjoint paths. Networks, 14:325–336, 1984.

[120] Shin-Lai Tien. Demand-Responsive Airspace Sectorization and Air Traffic Con- troller Staffing. PhD dissertation, University of Maryland, College Park, MD, 2010.

[121] Shin-Lai Tien, Robert Hoffman, and Paul Schonfeld. En route sector combina- tion scheme to minimize air traffic controller staffing. In Proc. of Transportation Res. Board Annual Meeting, Washington, DC, January 2012.

[122] Shin-Lai (Alex) Tien and Robert Hoffman. Optimizing airspace sectors for varying demand patterns using multi-controller staffing. In USA/Europe Air Traffic Management Research & Development Seminar,Napa,CA,June2009.

[123] Luca Trevisan. Inapproximability of combinatorial optimization problems. Technical Report TR04–065, Electronic Colloquium on ComputationalCom- plexity, July 2004. BIBLIOGRAPHY 171

[124] Alberto Vasquez-Marquez. American Airlines Arrival Slot Allocation System (ASAS). Interfaces,21(1),January-February1991.

[125] Thomas Vossen and Michael Ball. Optimization and mediated bartering models for Ground Delay Programs. Naval Research Logistics, 53:75–90, 2006.

[126] Thomas W. M. Vossen and Michael O. Ball. Slot trading opportunities in collaborative Ground Delay Programs. Transportation Science,2005.

[127] Yao Wang. Prediction of weather impacted airport capacity using ensemble learning. In Proc. of AIAA/IEEE Digital Avionics Systems Conference, Seattle, WA, Octob er 2011.

[128] Yao Wang. Prediction of weather impacted airport capacity using RUC-2 fore- cast. In Proc. of AIAA/IEEE Digital Avionics Systems Conference, Williams- burg, VA, October 2012.

[129] Yao Wang and Deepak Kulkarni. Modeling weather impact on Ground Delay Programs. SAE Journal of Aerospace, 4(2):1207–1215, November 2011.

[130] Shawn R. Wolfe and Joseph L. Rios. A method for using historicalGround Delay Programs to inform day-of-operations programs. In AIAA Guidance, Navigation, and Control Conference, Portland, OR, August 2011.

[131] Jing Xiong. Revealed Preference of Airlines’ Behavior under Air Traffic Man- agement Initiatives. PhD thesis, University of California, Berkeley, 2010.

[132] Arash Yousefi, Robert Hoffman, Marcus Lowther, Babak Khorrami, and Her- bert Hackney. Trigger metrics for dynamic airspace configuration.InAIAA Aviation Technology, Integration, and Operations Conference, Hilton Head, SC, September 2009.