UPTEC F 10 045 Examensarbete 30 hp Juli 2010

Towards the Solution of Large-Scale and Stochastic Traffic Network Design Problems

Fredrik Hellman Abstract Towards the Solution of Large-Scale and Stochastic Traffic Network Design Problems Fredrik Hellman

Teknisk- naturvetenskaplig fakultet UTH-enheten This thesis investigates the second-best toll pricing and capacity expansion problems when stated as mathematical programs with equilibrium constraints (MPEC). Three Besöksadress: main questions are rised: First, whether conventional descent methods give Ångströmlaboratoriet Lägerhyddsvägen 1 sufficiently good solutions, or whether global solution methods are to prefer. Second, Hus 4, Plan 0 how the performance of the considered solution methods scale withnetwork size. Third, how a discretized stochastic mathematical program with equilibrium Postadress: constraints (SMPEC) formulation of a stochastic network design problem can be Box 536 751 21 Uppsala practically solved. An attempt to answer these questions is done through a series of numerical experiments. Telefon: 018 – 471 30 03 The traffic system is modeled using the Wardrop’s principle for user behavior,

Telefax: separable cost functions of BPR- and TU71-type. Also elastic demand is considered 018 – 471 30 00 for some problem instances.

Hemsida: Two already developed method approaches are considered: implicit programming and http://www.teknat.uu.se/student a cutting constraint algorithm. For the implicit programming approach, several methods—both local and global—are applied and for the traffic assignment problem an implementation of the disaggregate simplicial decomposition (DSD) method is used. Regarding the first question concerning local and global methods, our results don’t give a clear answer. The results from numerical experiments of both approaches on networks of different sizes shows that the implicit programming approach has potential to solve large-scale problems, while the cutting constraint algorithm scales worse with network size. Also for the stochastic extension of the network design problem, the numerical experiments indicate that implicit programming is a good approach to the problem. Further, a number of theorems providing sufficient conditions for strong regularity of the traffic assignment solution mapping for OD connectors and BPR cost functions are given.

Handledare: Michael Patriksson Ämnesgranskare: Per Lötstedt Examinator: Tomas Nyberg ISSN: 1401-5757, UPTEC F 10 045 Acknowledgements I would like to express my appreciation and gratitude to my supervisor, Prof. Michael Patriksson for giving me the opportunity to write this thesis and for improving the final report by proofreading and through inspiring conversations. I also want to acknowledge Dr. Christoffer Cromvik for helping me in the initial phase, Prof. Clas Rydergren and Joakim Ekström at Linköping University for pro- viding me with data on the Stockholm network, and Dr. Napsu Karmitsa at University of Turku for providing the source code for LMBM-B. I want to thank my new friends at the Department of Mathematical Sciences at Chalmers for this year and all the nice lunch and coffee breaks. My gratitude also goes to my family and friends for their support during this process, which lasted longer than planned. I am very grateful to my beloved Hanna for giving me great support all the way, despite us being literally parted by an ocean.

i Contents

Acronyms iv

Glossary of Notation v

1 Introduction 1

2 Contribution 1

3 Solution Approaches 1

4 Traffic Assignment Problem 2 4.1 Model Description ...... 2 4.2 As Solution to an Optimization Problem ...... 4 4.3 System Optimal Traffic Assignment ...... 7 4.4 Convergence Measures ...... 7

5 Network Data: Graph and Functions 8 5.1 Cost Functions ...... 8 5.2 Demand Functions ...... 8 5.3 Centroids and OD Connectors ...... 9

6 Network Design Problem 10 6.1 Problem Definition ...... 10 6.2 Mathematical Programming Models ...... 11 6.3 Stability and Subgradients of Traffic Assignment Solution ...... 13 6.4 Computing Subgradients of Traffic Assignment Solutions ...... 15 6.5 Uniqueness and Strong Regularity in Practice ...... 17 6.6 Existence of Solutions to MPEC ...... 21 6.7 Stochastic MPEC ...... 22 6.8 Toll Pricing Problem (TP) ...... 25 6.9 Capacity Expansion Problem (CEP) ...... 26

7 Approach I: Implicit Programming 28 7.1 Solving the Traffic Assignment Problem ...... 28 7.2 Solving the Sensitivity Analysis Problem ...... 30 7.3 Solving the Stochastic Extension ...... 31 7.4 Solving the Network Design Problem Locally ...... 32 7.4.1 SDBH ...... 32 7.4.2 SNOPT ...... 34 7.4.3 LMBM-B ...... 34 7.5 Solving the Network Design Problem Globally ...... 34 7.5.1 ...... 35 7.5.2 NFFM ...... 36 7.5.3 EGO ...... 38 7.5.4 DIRECT ...... 40 7.6 Implementation and Usage Details ...... 40 7.6.1 DSDTAP ...... 40 7.6.2 SDBH ...... 41 7.6.3 SNOPT ...... 42 7.6.4 LMBM-B ...... 42 7.6.5 NFFM ...... 42 7.6.6 EGO ...... 43 7.6.7 DIRECT ...... 43

8 Approach II: Cutting Constraint Algorithm 44

ii 9 Numerical Experiments 46 9.1 Problem Set ...... 47 9.1.1 Harker and Friesz CEP (HF CEP) ...... 47 9.1.2 Sioux Falls Fixed Demand CEP (SFF CEP) ...... 48 9.1.3 Sioux Falls Elastic Demand TP (SFE TP) ...... 48 9.1.4 Small Stockholm Elastic Demand TP (STHLM TP) ...... 49 9.1.5 Anaheim Fixed Demand CEP (ANA CEP) ...... 50 9.1.6 Barcelona Fixed Demand (BARC) ...... 51 9.1.7 Summary of Problems ...... 51 9.2 Precision of Objective Function ...... 51 9.3 Evaluation of Rules for Defining Used Routes ...... 53 9.4 NFFM on Six-Hump Camel Function ...... 54 9.5 DSDTAP on a Trivial Elastic Demand Problem ...... 57 9.6 Time Complexity in Number of Scenarios of LMBM-B and CCA for Stochastic Extension 58 9.7 Sioux Falls with Elastic Demand First-Best Toll Pricing Problem (SFE FB TP) ...... 58 9.8 Harker and Friesz Capacity Expansion Problem (HF CEP) ...... 60 9.8.1 Local and Global Optimization ...... 60 9.8.2 Investigation of Failure of NFFM on HF CEP ...... 61 9.8.3 Global Optimization on HF 2 CEP ...... 63 9.8.4 Stochastic Optimization ...... 64 9.9 Sioux Falls with Fixed Demands Capacity Expansion Problem (SFF CEP) ...... 67 9.10 Stockholm Toll Pricing Problem with Cordon J2 (STHLM J2 TP) ...... 68 9.10.1 Local and Global Optimization ...... 68 9.10.2 Stochastic Optimization ...... 69 9.11 Anaheim Capacity Expansion Problem (ANA CEP) ...... 71 9.12 Barcelona with Fixed Demand (BARC) ...... 73

10 Discussion 73

11 Conclusions 75

A Network data 79 A.1 Anaheim ...... 79

iii Acronyms

BFGS Broyden-Fletcher-Goldfarb-Shanno (Hessian approximation method) BPR Bureau of Public Roads (cost function) CCA Cutting Constraint Algorithm (Approach II) CEP Capacity Expansion Problem (special Network Design Problem) CVAR Conditional Value at Risk (objective function for stochastic optimization) DIRECT Dividing Rectangles (method for global optimization) DSD Disaggregate Simplicial Decomposition (method for solving the traffic assignment problem) DSDTAP DSD Traffic Assignment Problem (implementation of DSD) EGO Efficient Global Optimization (method for global optimization LMBM-B Limited Memory Bundle Method Bounded (method for local optimzation) MFCQ Mangasarian-Fromovitz Constraint Qualification (constraing qualification) MNL Multinomial Logit model (model for elastic demand) MPCC Mathematical Program with Complementarity Constraints (optimization model class) MPEC Mathematical Program with Equilibrium Constraints (optimization model class) NDP Network Design Problem (optimzation model) NLP (optimization model class) OBA Origin-Based Algorithm for Traffic Assignment Problem (method for solving the traffic assignment problem) QP (optimization model class) RMP Restriced Master Problem (optimization model part of CCA) SAA Sample Approximation Method (discretization method) SCEP Stochastic CEP (special stochastic Network Design Problem) SDBH Steepest-Descent-BFGS-Hybrid (method and implementation for local optimization) SMPEC Stochastic MPEC (optimization model class) SNOPT Software for Large-Scale Nonlinear Optimization (implementation of local optimiza- tion method) SQP Sequential Quadratic Programming (method for local optimization) SQOPT Software for Large-Scale Linear and Quadratic Programming) STP Stochastic TP (special stochastic Network Design Problem) TAP Traffic Assignment Problem (optimization model) TAPAS Traffic Assignment by Paired Alternative Segments (method for solving the traffic assignment problem) TP Toll Pricing Problem (special Network Design Problem) TU71 Unknown (cost functions) VOT Value of Time (parameter)

iv Glossary of Notation

Network Model

Definitions can be found in the section specified by last column.

G = (A, N ) Traffic network graph G with links A and nodes N 4.1 a Link 4.1 C, k Set of origin-destination pairs and member k 4.1 Gk = (Ak, NK ) Traffic network graph understood by travelers on OD pair k 4.1 R Set of all routes 4.1 Rk Set of routes for OD pair k 4.1 r Route 4.1 M = (mij) Incidence matrix for G 4.1 Mk Incidence matrix for Gk 4.1 d = (dk) Demand variable 4.1 D = (Dk) Demand function 4.1 π = (πk) Travel cost variable 4.1 t = (ta) Link cost function 4.1 v = (va) Link flow variable 4.1 h = (hkr) = (hr) Route flow variable 4.1 δ = (δkra) = (δra) Route-link incidence value 4.1 Λ Route-link incidence matrix 4.1 Γ Route-OD pair incidence matrix 4.1 z Traffic assignment objective function 4.2 y = (v, d) Composed traffic assignment solution 4.2 V Set of feasible cycle-free flow solutions y 4.2 C Gradient of z w.r.t y, i.e, C = ∇yz 4.2 SS Social surplus function 4.3 UC User cost function 4.3 SC Social cost function 4.3 RDG Relative duality gap of traffic assignment solution 4.4

Network Data

Definitions can be found in the section specified by last column.

b Parameters for BPR cost function 5.1 c Parameters for TU71 cost function 5.1 0 Tk,Ak,Kk, π , α Parameters for MNL demand function 5.2 max min dk , πk Maximum demand and corresponding minimum cost 5.2

Network Design Problem

Definitions can be found in the section specified by last column.

v F Network design problem objective function 6.1 x Design variables 6.1 X Set of feasible design variables 6.1 xL, xU Lower and upper bounds on x defining X 6.1 S Set of traffic assignment solutions given design x 6.2 σ Single-valued function mapping design x to traffic assignment 6.2 solution ξ Perturbation of traffic assignment solution 6.3 K Set of feasible perturbations at y 6.3 1,2,3 Πk Sets of used, degenerate and unused routes respectively 6.4 A0 Subset of links for which derivative of the cost function is zero 6.5 A+ Subset of links for which derivative of the cost function is 6.5 nonzero A× Subset of links that are unused and belong to nondegenerate 6.5 routes G0k Subgraph of Gk where the set of links is A0 6.5 M0k Incidence matrix for graph G0k 6.5 V˜ Set of feasible flow solutions y (including cycles in routes) 6.5 Ω Sample set 6.7 Θ Set of events 6.7 P Probability measure 6.7 ω Stochastic variable 6.7 N Number of samples or scenarios in discretization 6.7 ω1, . . . , ωN Samples from discretization 6.7 β Parameter for conditional value-at-risk objective function 6.7 Gβ Conditional value-at-risk objective function 6.7 γ Auxiliary variable for conditional value-at-risk objective function 6.7 FˆN Discretized expectation value objective function 6.7 GˆN Discretized conditional value-at-risk objective function 6.7 I Set of toll gate groups or expansion groups 6.8, 6.9 W = (wai) Toll gate group or expansion group matrix 6.8, 6.9 τ Toll price variable 6.8 tˆ Link cost function including toll 6.8 ρ Expansion variable 6.9 φ Investment cost function 6.9

Implicit Programming Definitions can be found in the section specified by last column.

ek Excess demand variable 7.1 DSD Error tolerance in relative duality gap for DSDTAP 7.1 1,2,3 fuzz Parameter for assignment of routes to Πk 7.1 BFGS, T , L Parameters for SDBH 7.4.1 P Filled function 7.5.2 µ, µmax Filled function parameter and its maximum value 7.5.2 δD Disturbance constant for NFFM 7.5.2

Numerical Experiments Definitions can be found in the section specified by last column.

NF Number of function evaluations required (Approach I) 9.3 NI Number of iterations required (Approach II) 9.8.1 T Time elapsed before termination 9.8.1

vi 1 Introduction

This work has its ultimate background in the problem of road traffic congestion. A road traffic network is an infrastructure used for transportation of—among many other things—people to and from work, goods from factories to stores and firetrucks or ambulances in duty. In view of society a heavily con- gested traffic network is costly, since it delays the traffic in it. Therefore, there is a ground for doing research on how to reduce congestion. The problem of road congestion has long been subject to research and there are well-established mathematical models for traffic networks using cost and demand functions and the behavior of road users through equilibrium definitions ([BMW56], [Pat94], [She85]). Means for reducing congestion have also been developed and formalized to mathematical models. In this work, capacity expansions of roads and toll pricing are the means considered, which formalized as mathematics constitute mathematical programs with equilibrium constraints (MPEC) with continu- ous decision variables. This class of optimization problems is called network design problem (NDP). Recently ([Pat08]), stochastics has been added to the models, allowing for modeling of unpredictable phenomena such as varying weather conditions, fluctuating demand or uncertainty in investment bud- get calculations. These problems are formalized as stochastic mathematical programs with equilibrium constraints (SMPEC). How can one solve large-scale deterministic and stochastic network design problems in practice? Are conventional descent methods sufficient for finding good solutions to the network design problem or can substantially better solutions be found employing global optimization methods? This work is an attempt to answer these questions through the study of the numerical properties and performance of a number of optimization methods on traffic network design problems of varying sizes. Section 3 contains a brief discussion on the choice of track and solution approaches considered. It is followed by three sections (Sections 4, 5 and 6) developing the theory behind our models and methods. Two major approaches to solving the traffic network design problem are considered; they are described in Sections 7 and 8. The emphasis throughout the work is on the approach described in Section 7. In Section 9 the numerical experiments are presented, followed by a discussion in Section 10. Section 11 concludes the thesis.

2 Contribution

The main contribution of this work is a series of numerical experiments in Section 9 indicating the applicability of already existing solution methods for large-scale and stochastic traffic network design problem. To assess the applicability of a method we investigate whether its performance scales well with problem size (both in terms of network size and stochastic scenarios) and how a solution from it compares with the best solution. As a result of the theoretical development a number of theorems have been proved in Section 6.5, giving practical conditions for applying the implicit programming approach on networks with OD connectors and BPR cost functions.

3 Solution Approaches

Our aim is to solve the deterministic network design problem locally and globally and also to solve a stochastic version of network design problem locally (where data comes from a probability distribu- tion). The network design problem will be defined in terms of mathematics in Section 6, but already in this section a brief survey of the considered solution approaches is presented, since the decisions regarding solution approaches have shaped the theoretical discussion in the following few sections. In a general form, our problem can be formulated as a mathematical program with equilibrium constraints (MPEC, described in Section 6.2). Several MPEC solution approaches have been considered: [Fle+06; ATW07; DeM+05; LS04; KDB09; LLCN06; RB05; LH04; Jos03; OKZ98]. Two approaches were chosen in this work, both which previously have shown successful for network design problem. As main solution approach was chosen what we call Approach I in this work (see Section 7). This approach is an implicit programming approach ([Jos03; OKZ98]) and has the advantages that already

1 existing software for the traffic assignment problem, the upper-level problem and for global optimiza- tion can be used. A disadvantage is that the objective functions are not differentiable everywhere, and computing useful gradients or subgradients (where possible) requires high precision traffic assignment solutions. Another solution approach considered is the one presented in [LH04], which is a nonlinear program- ming (NLP) approach of cutting plane type in the setting of toll pricing problems. It is called Approach II in this work (see Section 8).

4 Traffic Assignment Problem

A corner stone in a network design problem is the problem of finding the user behavioral response of a network design. How do we describe the traffic situation in a road network? What simplifying assumptions do we have to impose to make the problem amenable? In this section notation for the description of a traffic situation will be established, and the problem of assigning traffic according to the users’ choice of route (user equilibrium) and the problem of assigning traffic to maximize the utilization of the traffic system (system optimum) will be defined. The problem of assigning traffic in a network is called the traffic assignment problem (TAP).

4.1 Model Description The model we will use is static and macroscopic, which means it is time independent (in contrast to dynamic) and considers flow of cars as a continuous variable instead of treating each car as separate entity. The latter modeling paradigm is known as microscopic in the literature. Since the chosen model is static, but reality certainly is not, only the time of day when traffic is most heavily congested is considered and it is assumed that an equilibrium flow has been settled at this time. The macroscopic modeling of the vehicle flow makes this model mathematically tractable for larger systems. The travelers (users) and their transportation needs are modeled as flow demands between two locations in the road network and are named origin-destination pairs (OD pairs). The response of the travelers, we define as the user equilibrium which in turn is quantified by Wardrop’s first principle:

Definition 1. (User Equilibrium) For each OD pair, at user equilibrium, the travel time on all used paths is equal, and also less than or equal to the travel time that would be experienced by a single vehicle on any unused path.

Another way of expressing it is that each driver chooses the fastest route from the origin to the destination. The model used here is exactly the one developed in the book [She85]. The model to be described here is based on a series of unrealistic but common assumptions. A traveler is assumed to have perfect information about the traffic conditions, so that the perceived delay of a chosen route is the actual delay. However, differentiation in the delay perception for different drivers can be achived by using a model incorporating stochastic user equilibrium (SUE). This has not been done in this work. In [She85, Part IV] there is more to read about SUE. The data to be provided for a traffic flow model is: the traffic network, the transport demand and the cost functions ([MP07]). Let the traffic network be modeled as a directed graph G = (N , A) with a set of nodes N and links (or arcs) A, where nodes corresponds to road intersections, and links to roads. Our directed graph can |N |×|A| be described by an incidence matrix M ∈ R such that mij = −1 if i is the tail of link j and mij = 1 if i is the head of link j. For the transport demand we define a set of OD pairs C ⊆ N × N where each pair k = (i, j) ∈ C corresponds to a transportation demand from node i to node j. If the transportation demand depends on the travel cost (or travel delay) we say that we have elastic demand, otherwise we have fixed demand. In this work we will consider both fixed and elastic demand. In case of fixed demand, we denote it by constant dk for each OD pair k ∈ C. For elastic demand we need a model for the elasticity, and that model is a demand function for each OD pair k denoted Dk : πk 7→ dk which maps travel cost πk to travel demand dk.

2 Now when OD pairs have been defined we briefly turn our attention to the graph again. For some networks, travelers from different OD pairs will understand different subgraphs of G as the actual traffic network, and we will denote by Gk = (N , Ak) the subgraph of G that travelers from OD pair k ∈ C understand as the actual traffic network, where Ak ⊆ A. The incidence matrix for each subgraph Gk is denoted by Mk. A more detailed description of these subgraphs are given in Section 5.3. Cost functions (also known as delay functions) describe the cost of traveling in the network, which is most commonly measured in time or in monetary units. The cost of traveling between two nodes in an OD pair is assumed to be the sum of the cost of traveling on the links that comprise the path between the two nodes. This assumption is known as the assumption of additive costs. To describe the cost of travel we need a cost function for each link in the network. Let the cost function for link a ∈ A be denoted by ta(va), where va is the non-negative flow on link a. It is assumed that the travel cost on a link depends solely on the flow on that link. This assumption is not realistic for all traffic networks since traffic on one side of the road might be slowed down if the traffic in the other direction is congested (recall that the graph is directed). Further, we need a representation of the traffic state in the network. Several representations of the traffic state in a network can be found in the literature. One that is very common is the link-route representation. For each OD pair k = (i, j) ∈ C we have a set Rk of all cycle-free paths of Gk connecting S node i and node j. We also define R = k∈C Rk as the set of all considered routes in the network. The flow on path r ∈ Rk is denoted by hkr or hr. A route consists of a chain of links that connect the origin with the destination, and to describe a route we use δ defined as ( 1, if link a is part of route r in OD pair k, δkra = δra = 0, otherwise.

For a more compact representation, an arc-chain incidence matrix

Λ = (δra)|R|×|A|, (1) and a matrix relating routes to OD pairs

Γ = (γrk)|R|×|C|, (2) where ( 1, if r ∈ Rk, γrk = 0, otherwise, are introduced. To use this representation in practice for other than small scale problems one must use column generation or similar techniques to generate the paths needed from Rk since the total number of paths in general grows very fast with the size of the network. |A| Another representation is the link-node representation where we denote by vk ∈ R+ the flow in the network caused by the transportation demand in OD pair k, i.e., the network link flow is disaggregated over the OD pairs. With this link flow representation it is possible to set up conditions for the demands being met, which wouldn’t be possible with a total link flow represantation. The number of variables increases as a polynomial in the size of the network. This representation is also common in the literature, see e.g., [Ral08]. A third representation is the origin aggregate link-node representation. For this one we have a vector |A| vl ∈ R+ of flows for each origin l ∈ O = {i | ∃j :(i, j) ∈ C}. It can be interpreted as if we have one network with link flows for each origin. Like the previous link flow representation, it is possible to set up conditions for when the demands are met for this representation. This yields much fewer variables than the previous representation and is hence easier to use for larger problems. We’ll here illustrate what a user equilibrium is, and introduce the reader to the notation by an ex- ample, based on an example from [She85, page 75]. The following example shows how adding a link to a network might increase the travel times in that network. Example 1. (Notation and User Equilibrium – Braess’ Paradox) (a) Consider the network in Figure 1(a). It has four links, which we identify by numbers according to the figure. The link set is A = {1, 2, 3, 4}. There are four nodes, N = {A, B, C, D}. We define

3 C C 1 4 1 4

A B A 5 B

3 3 2 2 D D

(a) Four-link network (b) Five-link network

Figure 1: The two networks considered in Example 1.

one OD pair (A, B), hence C = {(A, B)}. The cost functions of the links are ta(va) = 50 + va for a = 1, 2, ta(va) = 10va for a = 3, 4, and we let the demand d(A,B) be equal to 6. Two routes, (1, 4) and (3, 2) can be identified for the OD pair, hence R(A,B) = {(1, 4), (3, 2)}. For each route we have a flow, denoted by h(1,4) and h(3,2) respectively. The flows on the links are related to the route flows as v1 = v4 = h(1,4) and v3 = v2 = h(3,2). If we map the nodes A, . . . , D to the numbers 1,..., 4, the incidence matrix M = R|N |×|A| is −1 0 −1 0   0 1 0 1  M =   .  1 0 0 −1 0 −1 1 0

Thanks to the symmetry of the problem, the user equilibrium solution can immediately be deter- mined as h(1,4) = 3 and h(3,2) = 3, giving v = (3, 3, 3, 3) and the travel cost t1 + t4 = t3 + t2 = 83. P4 The total travel cost in the system is a=1 tava = 498. This solution is consistent with the definition of user equilibrium in Definition 1. (b) We now add a fifth link to the network and get the network in Figure 1(b). The new link has cost T function t5(v5) = 10 + v5. A new column (0, 0, 1, −1) is added to M. As a consequence of adding the new link, a new route can be identified: (3, 5, 4). Assume that we have the flow solution from (a) and that v5 = 0. The cost of the new route then is 40 which is less than 83, i.e., less than the cost of the other routes. That solution is hence not a user equilibrium solution. The new user equilibrium solution becomes v = (2, 2, 4, 4, 2) with equal flow on all routes. The cost P5 of each route then is 92. The total travel cost is a=1 ta(va)va = 552. Note that these costs are higher than in (a). This is what is known as Braess’ paradox: Adding a link to the network may increase the travel cost for all users and also for the system as a whole.

4.2 As Solution to an Optimization Problem From [She85, page 59] we have that the user equilibrium solution is the solution to the following opti- mization problem:

Z va Z dk X X −1 minimize z(v, d) = ta(ω)dω − Dk (ω)dω, (v,h,d) a∈A 0 k∈C 0 X subject to hkr = dk, ∀k ∈ C, r∈R k (3) h ≥ 0, d ≥ 0, X X va = δkrahkr, ∀a ∈ A.

k∈C r∈Rk

4 In the above program, the vectors h, d and v are composed of the route flows hkr, demands dk and link flows va respectively, in some arbitrary, but throughout this text, consistent order. This naming convention will be used for other vectors as well, where the the absence of index means a vector of the indexed quantity with the same name. The feasible set in terms of (v, d) can also be presented more compactly as

n |A|+|C| T o V = (v, d) ∈ R ∃h ≥ 0 : Γ h = d, Λh = v , where Λ and Γ are defined in (1) and (2) respectively. The set V will be referred to in later sections, but for the remainder of this section, the more spacious representation will be used. A fixed demand version of the traffic assignment problem will also be used in this work:

X Z va minimize z(v) = ta(ω)dω, (v,h) a∈A 0 X subject to hkr = dk, ∀k ∈ C,

r∈Rk (4) h ≥ 0, X X va = δkrahkr, ∀a ∈ A.

k∈C r∈Rk where dk ≥ 0 are constants. The following discussion on the optimization problem will mainly cover the elastic demand case. However, the results are also applicable to the fixed demand case. In [MP07] conditions for existence and uniqueness of solutions to (3) and (4) are given. The following assumptions on the network and data functions will be in effect during the remainder of this text, and they guarantee the existence of solutions to the two programs above (see Theorem 4 in [MP07]):

Assumption 1. (Existence of Traffic Assignment Solution)

1.(i) The network G is strongly connected with respect to the OD pairs, i.e., there is at least one route for each OD pair.

1.(ii) Dk(πk) is positive and upper bounded. It is also invertible for dk = Dk(πk) > 0.

Further, for uniqueness of link flow and demand solution y = (v, d) we add the following assump- tions (see [OKZ98, Theorem 4.4(i)]):

Assumption 2. (Uniqueness of Traffic Assignment Solution)

(2.i) The vector-valued function  t(v)  C(y) = −D−1(d)

is strictly monotone on the feasible set V , i.e.,

T (C(y1) − C(y2)) (y1 − y2) > 0, ∀y1, y2 ∈ V, y1 6= y2.

To show that the solution to (3) really is a user equilibrium solution, i.e., conformal with Definition 1, we will derive the first-order optimality conditions for the problem. We use that v = v(h) from the last row in (3). First we note that (3) fulfills the Abadie constraint qualification, since all constraints are linear or affine. This means the KKT conditions are necessary for any local minimum. We form the partial Lagrangian, with π = (πk)|C| as Lagrange multipliers, ! X X L(v(h), d, u) = z(v(h), d) + πk dk − hkr ,

k∈C r∈Rk

5 which should be minimized with respect to the flow variables and maximized with respect to the dual variables, subject to hkr ≥ 0 ∀r ∈ Rk, k ∈ C, dk ≥ 0 ∀k ∈ C. Then the first-order conditions for this program are: ∂L(h, d, π) hkr = 0, ∀r ∈ Rk, k ∈ C, (5a) ∂hkr ∂L(h, d, π) ≥ 0, ∀r ∈ Rk, k ∈ C, (5b) ∂hkr ∂L(h, d, π) dk = 0, ∀k ∈ C, (5c) ∂dk ∂L(h, d, π) ≥ 0, ∀k ∈ C, (5d) ∂dk ∂L(h, d, π) = 0, ∀k ∈ C, (5e) ∂πk hkr ≥ 0, ∀r ∈ Rk, k ∈ C, (5f) dk ≥ 0, ∀k ∈ C. (5g) To simplify this set of equations, start with (5a) and (5b) and calculate P P Z kˆ∈C rˆ∈R δkˆraˆ hkˆrˆ ∂L(·) ∂ X k = ta(ω)dω − πk ∂hkr ∂hkr a∈A 0 X = δkrata (va) − πk a∈A

= ckr − πk, P where ckr(h) = a∈A δkrata(v) is the route cost for route r in OD pair k. We continue with (5c) and (5d) by calculating Z dk ∂L(·) ∂ X −1 = − Dk (ω)dω + πk ∂dk ∂dk k∈C 0 −1 = πk − Dk (dk). Equation (5e) is simply the original constraint, so we can summarize the first-order conditions as

hkr(ckr − πk) = 0, ∀r ∈ Rk, k ∈ C, (6a)

ckr − πk ≥ 0, ∀r ∈ Rk, k ∈ C, (6b) −1 dk(πk − Dk (dk)) = 0, ∀k ∈ C, (6c) −1 πk − Dk (dk) ≥ 0, ∀k ∈ C, (6d) X dk − hkr = 0, ∀k ∈ C, (6e)

r∈Rk

hkr ≥ 0, ∀r ∈ Rk, k ∈ C, (6f) dk ≥ 0, ∀k ∈ C. (6g) These conditions can easily be mapped to the user equilbirium Definition 1. (6a) says: If a route is used (hkr > 0), then it has the same cost as all other routes for that OD pair (ckr = πk), and (6b) says that πk is the cost of the cheapest route. Ergo, the travel time of all used paths are equal, and also less than or equal to the possible travel time of all unused paths. The conditions (6c) and (6d) are related to the elastic demand and say that the demand and cost relation follows the demand function if the demand is not zero. If the demand is zero, then the cost might exceed the cost break point value of the demand function for zero demand. Since the first-order conditions are necessary conditions for optimality, for our unique minimum to the program, the solution has settled to a user equilibrium.

6 4.3 System Optimal Traffic Assignment Traffic flow can of course not only be assigned to the network using Wardrop’s principle, but also in other ways. An alternative traffic asssignment is that of the system optimal traffic assignment which will be discussed here. In the user equilibrium problem defined as an optimization problem in Section 4.2, we minimize the cost for each driver separately in the system, assigning the traffic according to Definition 1. However, if we change the objective function to instead optimize the traffic assignment for the sake of the system or society, the traffic will be assigned for optimal utilization of the traffic system. Social surplus (SS) is a measure of the economic value of the traffic system for the society and is defined as the difference between user benefit (UB) and social cost (SC). The user benefit is a measure of what all traffic network users benefit from using the system, while social cost is a measure of the cost of the system for the society as a whole. The social cost (SC) is the time spent by all users in the system and is defined as X SC = ta(va)va. a∈A The user benefit (UB) is defined as the value of the actual trips being made:

Z dk X −1 UB = Dk (ω)dω. k∈C 0

The full objective function, which is a measure of the negative social surplus, becomes

Z dk X X −1 −SS(v, d) = z(v, d) = SC(v) − UB(d) = ta(va)va − Dk (ω)dω. (7) a∈A k∈C 0

Hence, we solve problem (3) with the objective function in (7). This means the traffic is explicitly assigned to give maximum social surplus, and we call that solution a system optimal (SO) solution. If the demand is fixed, the user benefit is considered constant and we use only SC as objective function.

4.4 Convergence Measures When solving the traffic assignment problem numerically, one needs a termination criterion that in- volves checking some measure of convergence. This section reviews the convergence measures that will be used by the numerical methods described in Sections 7 and 8. A thorough discussion on con- vergence measures for the traffic assignment problem is given in [RDK88]. One convergence measure is the duality gap. It is defined as follows:

X (S) DG = ta(va)(va − va ), a∈A

(S) where va is the link flow corresponding to an all-or-nothing assignment of the traffic for flow va. (All- or-nothing assignment for flow va means that the cost of each link a is constant and equal to ta(va) and that each traveler takes the cheapest route. Note that there are no congestion effects!) If va is the true solution, then all travelers already are on routes that are equally expensive and cheaper than all other routes, why DG is zero. Another way of expressing this is in route flows and costs:

X X min DG = hkr(ckr − ck ),

k∈C r∈Rk

min where ck = minr ckr. The duality gap is often divided by a normalizing factor ([RDK88]) and we have a relative duality gap (or just relative gap):

DG RDG = P . a∈A ta(va)va

7 The relative duality gap will be used in the algorithms in this work. However, there is another very common convergence measure, namely average excess cost (AEC), which is the duality gap divided by the total demand: DG AEC = P . k∈C dk This quantity is not dimensionless and is probably more sensitive to badly scaled cost functions. On the other hand, it has a clear interpretation: It is equal to the amount of time that road users waste for not using the fastest route.

5 Network Data: Graph and Functions

The data for a network design problem is given by the network graph, and the cost and demand func- tions. In Sections 5.1 and 5.2 descriptions of the cost and demand functions that are used for the net- works in this work can be found. When creating a network model over a city, it is common to include dummy nodes and link (centroids and OD connetors) in the network and a description of them is pre- sented in Section 5.3.

5.1 Cost Functions We will consider two models for the cost functions: BPR functions ([Bpr]) and TU71 functions ([Jon95]).

BPR. The BPR cost functions are defined as follows:

 ba,4 va ta(va) = ba,1 + ba,2 , (8) ba,3

where ba,1 ≥ 0, ba,2, ba,3, ba,4 > 0. ba,1 is known as free flow time, i.e., the time it takes to travel the link if it is uncongested. There is a redundancy in ba,2 and ba,3, and they both indicate the capacity of the link. ba,4 is a measure of how pronounced the capacity limit is. The BPR functions fulfill Assumption 2, which gives a unique traffic assignment solution. A devastating, but nevertheless important property of the BPR functions is the vanishing derivative at va = 0.

TU71. The TU71 functions are defined in two intervals of the link flow; as a power function for flows lower than the capacity and as a linear function for flows larger than the capacity:

  ca,4  ( va va c + ca,2 1 + c + ca,0, va ≤ va,cap, ta(va) = a,1 a,3 (9) ca,5 + ca,6(va − va,cap) + ca,0, va > va,cap, where ca,0, va,cap ≥ 0 and ca,1, ca,2, ca,3, ca,4 > 0. An important note: There is no condition saying that ta should be differentiable at va = va,cap. In the cases where the TU71 functions occur in this work, va,cap is assumed to be infinite, so that the function is differentiable. Our choice of traffic assignment solver relies on continuous derivatives of ta and converges slowly if ta is nonsmooth. Besides, it is not investigated what would happen to the sensitivity analysis of the TAP solution at those points (see Section 6.4). Under these conditions, the TU71 functions fulfill Assumption 2, which gives a unique traffic assignment solution. The constants can be interpreted similarly to the ones for the BPR functions.

5.2 Demand Functions For the elastic demand, only one model has been considered, namely the multinomial logit model (MNL) ([McF70]). It is a way of modeling two travel modes rather than a reduced travel demand. The term modal split is common in the literature for this kind of model.

8 MNL. A brief introduction to the model is presented in [Eks08]. Here we will simply state the resulting inverse demand function:   −1 1 Tk Dk (dk) = uk + ln − 1 , (10) α dk max where uk, α and Tk are constants. Letting dk ≤ Tk, we have an inverse demand function. Despite the max R dk −1 undefined value at dk = 0, the integral 0 Dk (ω)dω exists. Although stated differently in [Eks08], it is possible to use the excess demand problem formulation for this choice of demand function. Some explanation of the quantities is needed: uk is the cost of utilizing public transportation for individuals in OD pair k and is assumed to be constant. However, MNL models that different travelers estimate the costs of car versus public transport differently, and α is a parameter for the variance in the travelers’ cost estimations. Tk is the total number of travelers in the system for OD pair k, including both car users and users of public transport. An alternative version of equation (10) is      −1 0 1 Ak Tk Dk (dk) = π + ln + ln − 1 , (11) α Kk dk

0 where π is the no-toll scenario travel costs. Here Ak is the number of car users and Kk is the number of public transport users, and Tk = Ak + Kk. The above form (11) is the one we will work with, and it is known as the pivot point version of the MNL model. See [Eks08] for the derivation of (11) from (10). max A value of the maximum demand dk can be computed, given the smallest possible cost for an OD max pair. For the MNL elasticity model, dk can be obtained from      min −1 max 0 1 Ak Tk πk = Dk (dk ) = π + ln + ln max − 1 , α Kk dk which yields T dmax = k . k K α(πmin−π0) k e k + 1 Ak Note that the MNL demand function fulfill Assumption 1.(ii).

5.3 Centroids and OD Connectors It is common in network modeling to add special links and nodes to the graph called OD connectors and centroids respectively, where centroids are nodes through which no traffic may flow. Their only purpose is to be origins or destinations of demand. They are artificial nodes, and for connecting them to the actual traffic network, links named OD connectors are used. OD connectors are links with constant cost. Several OD connectors can be connected to the same centroid. The purpose of centroids and OD connectors is to simulate different possible entry and exit points for certain traffic demand. For example, a distant commuter entering a city during rush hour can make decisions about his or her entry point outside the modeled network, if the modeled network doesn’t geographically cover his or her true origin. This can be modeled by creating an artificial centroid node, and through OD connectors connecting that node to nodes in the “real” network. It was mentioned in the Section 4 that each traveler in OD pair k has an understanding of the traffic network as the subgraph Gk of G. (See Figure 2 for an illustration of the subgraphs.) In such a subgraph, all centroids except the origin and destination of k are removed to make sure no traffic flow through them. Also, OD connectors that are in-going links to the origin and out-going links from the destina- tion are absent in Gk, because they are not needed, and furthermore make it impossible to establish uniqueness of the traffic assignment solution.

9 Full graph of tra c network Centroid 2 Centroid 3 Centroid 1

Subgraph for OD-pair 1: Subgraph for OD-pair 2: From centroid 1 to centroid 2 From centroid 1 to centroid 3

1 2

Figure 2: Illustrates a full graph G and its two OD pair specific graphs G1 and G2, where OD connectors have been removed. The dashed lines are OD connectors, while the solid lines are not.

6 Network Design Problem

This far, only the problem of traffic assignment has been discussed. Here we turn our attention to our main problem: the network design problem. First in this section the general network design problem (NDP) is defined in Section 6.1 and Section 6.2 formulates and discusses possible mathematical programs for NDP. In Section 6.3 conditions for the stability of the traffic assignment solution are developed, and Section 6.4 explains how to compute subgradients for our network design objective function. Section 6.5 investigates when the stability con- ditions are fulfilled for the data functions and networks used in practice and contains a contribution to the field through a number of theorems. Further, Section 6.6 is a short section establishing the existence of solutions to NDP. In Section 6.7 a review on how to model and solve NDP under uncertainty, in terms of a Stochastic MPEC (SMPEC) is presented. In Sections 6.8 and 6.9 two specializations of the network design problem are presented: the toll pricing problem and the capacity expansion problem. Note that the terminology here is different from the standard. By network design problem is here meant the general problem of making changes in continuous data of the traffic network, i.e., changes in objective, cost and demand functions. The toll pricing problem and the capacity expansion problem are two subclasses of this problem class, where these changes are allowed to be made in a certain way. It is common in the literature to equate the network design problem with what we call the capacity expansion problem here.

6.1 Problem Definition

A network design problem is the problem of determining optimal values of certain parameters of a traffic network. Those parameters could for example be road capacity expansions or toll prices. We limit ourselves to continuous parameters, i.e., we are for example not designing a network in terms of adding or removing links. An example of a network design problem from [CP10b] follows.

Example 2. (Network Design on Braess’ Network) This is a continuation of Example 1, which showed that adding a link to a network not necessarily makes the traffic network better in sense of travel times. Suppose now that we have the network in Figure 1(b), that we have as goal to minimize the total travel cost (user cost, UC) and that we can add a constant term τ to the cost function of link 5, so that t5 = 10 + v5 + τ and we are free to choose any τ ≥ 0. This could be regarded as adding a toll fee τ for traveling on link 5. This is an example of a network design problem. The optimal solution here is τ ≥ 13, which leaves link 5 unused and gives us the system optimal solution.

10 For a general network design problem, we denote our objective function by F = F (x, v, d), which is a function of the design parameters x ∈ Rn, the link flows v ∈ R|A| and demands d ∈ R|C|. This opens the possibility for many kinds of objective functions. We can for example relate the link flows to environmental impact or congestion, or compute the benefit of the system by using the demand information. In this work we will only consider two kinds of objective functions: user cost (UC) and social surplus (SS). (That’s not entirely true though, because in the numerical implementations and in the experiments instead of SS the difference between SS in the designed network and SS in the non-modified network is used, which is called the difference in social surplus and denoted by ∆SS.) The design variables x will be constrained only by simple bounds for the remainder of this text, and n the feasible set of design parameters is denoted by X = {x ∈ R | xL ≤ x ≤ xU }. A general formulation of the network design problem then is:

minimize F (x, v, d), (x,v,d)

subject to x ∈ X, (12) (v, d) solves the traffic assignment problem with design parameters x.

Here we can see a bilevel optimization nature of the problem, where we at the upper level has the goal to minimize some measure of social cost, and at the lower level solve an optimization problem to obtain the traffic assignment solution.

6.2 Mathematical Programming Models In this section, mathematical formulations of the general network design problem (12) will be discussed. We will start by formulating it as an MPEC (Mathematical Program with Equilibrium Constraints), then present the formulation on which the implicit programming approach relies. Further, a more explicit form of the equilibrium constraints will be formulated, and some scaling problems with it discussed. The traffic will be assigned to the network according to the user equilibrium flow solution, which (as shown in Section 4.2) can be determined by the first-order optimality conditions of the optimization problem (3). Assumptions 1 and 2 guarantee the existence and uniqueness of the traffic assignment solution. The first-order optimality conditions of (3) can equivalently be written as the variational inequality (see [OKZ98, page 70]): 0 ∈ C(x, y) + NV (y), (13) |A|+|C| m where y = (v, d) ∈ R = R , NV (y) is the normal cone of the feasible set V and C(x, y) = ∇yz(x, y), which now is a function of the design parameters x. Consider the mapping S : X → Rm defined by

m S(x) = {y ∈ R | 0 ∈ C(x, y) + NV (y)}. (14)

Note that S is a set-valued mapping, whose value contains all points y fulfilling the first-order optimal- ity conditions of (3), and under prescribed assumptions, S is a single-valued mapping. The full traffic assignment solution (h, v, d) is not unique, but we are now only considering y = (v, d). The set V of feasible link flows and demands is still a polyhedron in (v, d), and we can regard the route flow vector h as an auxiliary variable for being able to represent this polyhedron. Now, the network design problem (12) can be written as the following MPEC problem:

minimize F (x, y), (x,y) subject to x ∈ X, (15) y ∈ S(x).

The formulation that is focused on in this work is the implicit programming formulation. A vector- valued function σ will now be introduced, which will be used in place of the set-valued mapping S.

11 Definition 2. Since S is single-valued we define a function σ(x): X 7→ Rm for which y = σ(x) and y ∈ S(x).

The network design problem can then be stated as to

minimize F (x, σ(x)), x (16) subject to x ∈ X.

The function σ is evaluated by solving the traffic assignment problem (3) with modified cost functions according to the values of x. This formulation forms the basis for Approach I (Section 7) to solving the network design problem. Another representation can be obtained. We can replace the constraint y ∈ S(x) by an explicit form of the variational inequality, so that we get the program:

minimize F (x, y), (x,y) subject to x ∈ X, (17) T ∇yz(x, y) (ˆy − y) ≥ 0, ∀yˆ ∈ V.

This formulation forms the basis for Approach II (Section 8) to solving the network design problem, where the variational inequality is represented by finitely many inequalities and a cutting constraint (or cutting plane) method is used to add inequalities that are needed. A difference between the implicit programming approach and the approach with the explicitly stated variational inequality is that the set V in the latter is incorporated in the program. The equlib- rium constraints can also be formulatd as complementarity constraints. A dicussion on practical issues regarding the representation of V follows. If the constraint y ∈ S(x) is replaced by the first-order optimality conditions (6), we’ll get a math- ematical program with complementarity constraints (MPCC), which can be formulated as a nonlinear program (NLP). In general for an MPCC, the Mangasarian-Fromovitz constraint qualification (MFCQ) does not hold at any feasible point ([SS00]), which might lead to non-existence of Lagrange multipliers in the KKT optimality conditions. For that reason, one cannot in general hope to get good results from plugging the problem into a general NLP solver. In practice, it has been shown ([FL04]) to be possible to use NLP solvers (SQP solvers have been shown to be particularly good) on MPCC problems, but for our problem the size of an NLP formulation in terms of number of variables grows very fast with the network size. This is also true for the explicit variational inequality formulation in (17) above. The number of variables needed depends on how the set of feasible flows V is chosen to be repre- sented. A large network, such as Barcelona (99 origin nodes, 7922 OD pairs, 2522 links), would need 7922×2522 ≈ 20 million variables for defining V using the link-node representation, and 99×2522 ≈ 250 thousand variables using the origin aggregate link-node representation. For the link-route representa- tion, it is not as easy to say how many variables are needed. But when examining the results from a route based traffic assignment problem solver (OBA [BG02; BG10]) on the Barcelona network, the num- ber of used routes per OD pair didn’t exceed 11 and the total number of routes was 11988. So, if a column generation technique is used then the link-route representation requires approximately 12000 variables for representing V . However, during the process it is very probable that columns are gen- erated that are not needed in the final solution and the number of actual variables will then be larger, maybe not even in the same order of magnitude. For the implicit programming approach the set V is represented in the traffic assignment solver, and the solver of the upper level problem doesn’t need to take V into account. In the case of stochastics (which will be discussed in Section 6.7), the total set V will be a Cartesian product of several scenario specific sets V , and disaggregation on these is necessary for a scalable implementation. We will here define another assumption which characterizes well-behaved network design prob- lems. The continuous differentiability of C is necessary for the following stability anaylsis, while the directional differentiability of F will be used in the development of the SMPEC, and also for being able to compute subgradients of F .

Assumption 3. (Well-Behaved Network Design Problem)

12 (i) C is continuously differentiable w.r.t. x and y.

(ii) F is directionally differentiable w.r.t. x and y.

6.3 Stability and Subgradients of Traffic Assignment Solution In this section conditions for the stability of the user equilibrium, i.e., S(x), w.r.t. design parameters will be established. A discussion on how to compute Clarke subgradients (to be defined) of S follows after that. Recall that S is a set-valued mapping, whose value contains all points fulfilling the first-order op- timality conditions of (3), but under the prescribed assumptions (Assumptions 1 and 2), S is a single- valued mapping and S(x) = y. This will make us able to define a concept of strong regularity without involving the intersection of S with a small neighbourhood of a solution y∗ as is done in [OKZ98, page 86], where a reference pair (x∗, y∗) is introduced in order to focus on one solution y∗ among several. The reason this is necessary in [OKZ98] is that the solution y is not assumed to be globally unique, or equivalently: S is not assumed to be single-valued. However, in our case y∗ is unique, and our reference point will be only x∗.

Definition 3. If S is single-valued and Lipschitz continuous on some neighbourhood of x = x∗, we say that it is strongly regular at x∗.

Definition 4. If S is strongly regular at all x in X, we say that it is strongly regular.

A strongly regular mapping S reflects stability of the traffic assignment solution w.r.t. perturbations in design x, through the Lipschitz continuity. From [OKZ98, Theorem 5.3] we have a practical condition for strong regularity:

Theorem 1. If Assumption 3.(i) holds and V is polyhedral, then the following two statements are equivalent:

(i) S(x) is strongly regular at x = x∗.

(ii) The generalized equation ∗ ∗ 0 0 ξ ∈ ∇yC(x , y )y + NK (y ) (18)

has a unique solution y0 for all ξ ∈ Rm.

∗ ∗ ∗ ∗ Here ∇yC(x , y ) denotes the Jacobian of C w.r.t. y, and is then the Hessian in y of z at (x , y ). The set K is the critical cone of V at y∗, i.e.,

 0 m 0 ∗ 0T ∗ ∗ K = y ∈ R | y ∈ TV (y ) and y C(x , y ) = 0 , (19)

∗ ∗ where TV (y ) is the tangent cone of V at y . In our application, V certainly is polyhedral and conditions for strong regularity of S can be established by the uniqueness conditions for the generalized equation (18) from Theorem 1 above. The generalized equation (18) can be equivalently rewritten as a quadratic program

T 0 1 0T ∗ ∗ 0 minimize − ξ y + y ∇yC(x , y )y , y0 2 (20) subject to y0 ∈ K, where the solution y0 can be interpreted as the response from a perturbation ξ. A more explicit form of K will be needed for drawing conclusions regarding uniqueness of soutions to (18). For an y0 to belong to K it must: (i) belong to the tangent cone of V at y, and (ii) be orthogonal to the gradient C of our objective function. The solution y0 to the generalized equation (18) can be seen as the response after a perturbation ξ, and the K-membership of y0 is the condition that the response should still be (i) feasible and (ii) optimal. Now we will define the critical cone K in terms of linear inequalities and equalities in more explicit ∗ terms of link flows, route flows and demands. We begin with finding TV (y ) and then continue to find

13 ⊥ m 0T ∗ ∗ ∗ the intersection with C = {y ∈ R | y C(x , y ) = 0}. The definition of the tangent cone TV (y ) is the set of feasible directions at the point y∗, that is

  ΓT h0 = d0,       0 0  ∗  0 0 0 m 0  Λh = v ,  TV (y ) = y = (v , d ) ∈ R ∃h : h0 free, if h∗ > 0, ∀r ∈ R , k ∈ C,   kr kr k    0 ∗  hkr ≥ 0, if hkr = 0, ∀r ∈ Rk, k ∈ C.

∗ ⊥ 0 ∗ 0 0 T 0 0 Now we will intersect TV (y ) with C . Using that y ∈ TV (y ) we can say that Λh = v and Γ h = d . 0 ⊥ We also use the first-order optimality condition (6c), and Assumption 1.(ii) (dk > 0). We get for y ∈ C :

0 = y0T C(x∗, y∗) 0 T ∗ ∗ ∗ 0T ∗ ∗ ∗ = (Λh ) ∇vz(x , v , d ) + d ∇dz(x , v , d ) X X 0 ∗ X 0 T −1 ∗ = hkrckr − dk Dk (dk) k∈C r∈Rk k∈C (21) X X 0 ∗ X 0 T ∗ = hkrckr − dk πk

k∈C r∈Rk k∈C X X 0 ∗ ∗ = hkr(ckr − πk).

k∈C r∈Rk

∗ ∗ ∗ 0 ∗ Since ckr − πkr = 0 for hkr > 0 (from (6a)), we have that the free hkr in TV (y ) are still free in the critical ∗ ∗ ∗ cone. Remaining in the sum are the terms for which hkr = 0, and since ckr −πk ≥ 0 for those (from (6b)) all terms are non-negative. This means all terms must be equal to zero to sum up to zero and fulfill the equation in (21), i.e., the condition 0 ∗ ∗ hkr(ckr − πk) = 0 ∗ is added to those in TV (y ) to form K, which becomes

  T 0 0  Γ h = d ,      0 0    Λh = v ,   0 0 0 m 0  0 ∗  K = y = (v , d ) ∈ R ∃h , hkr free, if hkr > 0, ∀r ∈ Rk, k ∈ C,

  0 ∗ ∗ ∗    hkr ≥ 0, if hkr = 0 and ckr − πk = 0, ∀r ∈ Rk, k ∈ C,   0 ∗ ∗ ∗    hkr = 0, if hkr = 0 and ckr − πk > 0, ∀r ∈ Rk, k ∈ C. (22) To sum up the previous discussion, we have from Theorem 1 that the solution mapping S is strongly regular if the quadratic program (20) has a unique solution for all ξ ∈ Rm. In Definition 2, the function σ was introduced. Thereom 6.3 from [OKZ98], which is useful for establishing directional differentiability of σ(·), is recited in our context below. Theorem 2. If S is strongly regular at x, then the function σ is directionally differentiable for all x ∈ X, and for each direction x0, the directional derivative σ0(x, x0) is the unique solution of the generalized equation in variable y0: 0 0 0 0 ∈ ∇xC(x, y)x + ∇yC(x, y)y + NK (y ). The theorem says that the directional derivative σ0(x, x0) is given by the solution to quadratic pro- 0 gram (20) with ξ = −∇xC(x, y)x . This is of practical importance, because there are efficient methods for solving quadratic programs, and hence to obtain directional derivatives of σ. A sufficient condition for (20) to have a unique solution and for S being strongly regular is given by the following theorem ([OKZ98, Theorem 5.4]):

∗ ∗ Assumption 4. (Strong Regularity of Traffic Assignment Solution) ∇yC(x , y ) is positive definite on the linear subspace K − K. The notation with the minus between sets is defined as follows.

Definition 5. For a set A ⊆ Rm of vectors, A − A is the set {(a − b) ∈ Rm | a, b ∈ A}.

14 Theorem 3. Let Assumptions 4 and 3.(i) hold. Then quadratic program (20) has a unique solution y0∗ for all ξ ∈ Rm, and S is strongly regular at (x∗, y∗). Note that K − K is a linear subspace, since K is a polyhedral cone. The theorems and definitions above regarding strong regularity were based on the function C and feasible set V and critical cone K that are related to the variables y = (v, d). In our applications, some ∂ta(va) of the diagonal elements of ∇yC(x, y) will be zero (corresponding to links with = 0), which will ∂va make it impossible to directly use Theorem 3 to assert strong regularity. In Section 6.5 this problem is discussed in detail.

6.4 Computing Subgradients of Traffic Assignment Solutions First we introduce the concept of Clarke subgradient from nonsmooth analysis. The Clarke directional derivative of a function f at a point x ∈ Rn in the direction y ∈ Rn is defined as f(z + λy) − f(z) f 0(x; y) = lim inf , z→x,λ→0+ λ

where f 0(x; y) for directions y and −y can be interpreted as extremal directional derivatives of f along the direction y close to x. The Clarke generalized gradient or Clarke subdifferential is defined as the set

∗ n 0 T ∗ n ∂f(x) = {x ∈ R | f (x; y) ≥ y x for all y ∈ R }, (23)

where a member x∗ is called a Clarke subgradient. The Clarke directional derivatives (which were above interpreted as extremal directional derivatives for nearby points) works as support function for the set of Clarke subgradients. Note that when f is differentiable at x, f 0(x; y) = −f 0(x; −y) for all y and only ∇f(x) is in ∂f(x). Clarke subdifferentials can be used to classify points of a nonsmooth function as stationary points. Consider the problem to minimize f(x), x (24) subject to x ∈ X. Under the assumption that f is Lipschitz continuous, and the Mangasarian-Fromovitz constraint qual- n ification (MFCQ) holds for all x ∈ X = {x ∈ R | gi(x) ≤ 0, i = 1, . . . , p} where each function gi is continuously differentiable1, there are necessary optimality conditions (see [CP10a]) for local extreme points similar to the first-order necessary optimality conditions of smooth optimization, namely

0 ∈ ∂F (x) + ∇xg(x)µ, (25)

and 0 ≤ µ ⊥ g(x) ≤ 0, (26) where µ is a vector of Lagrange multipliers. A point x fulfilling (25) and (26) is called Clarke stationary. Bundle methods rely on this theory for their proofs of convergence. In this work we will use a bundle method, and this section discusses how a Clarke subgradient for our specific problem can be computed. For the remainder of the text, by subgradient is meant Clarke subgradient. The vector-valued function σ (from Definition 2) is Lipschitz continuous and directionally differen- 0 tiable, and the directional derivatives are computed solving (20) with ξ = −∇xC(x, y)x . We will refer to this problem as the sensitivity analysis problem. If the feasible set K of (20) is a linear subspace and consists only of linear equalities (not even affine equalities), then its solution can be obtained by solving a system of linear equations. Under such cir- cumstances, the directional derivative y0 = σ(x; x0) will be a linear mapping of the direction x0, i.e., one can write: ∗ ∗ 0 ∗ ∗ 0 −∇xC(x , y )x = ∇yC(x , y )y , y0 ∈ K,

1 n T MFCQ holds for this case if there exists a d ∈ R such that ∇gi(x) d < 0 for all i fulfilling gi(x) = 0.

15 which qualifies as a linear mapping since strong regularity is assumed at (x∗, y∗), and hence y0 is unique. So, if y0 = σ(x; x0) is a linear mapping in x0 at x = x∗, σ is differentiable at x = x∗. The Jacobian of σ is computed by solving for directional derivative y0 and assembling a Jacobian matrix. When is K a linear subspace? We define three index sets

1 ∗ Πk = {r | hkr > 0} 2 ∗ ∗ ∗ Πk = {r | hkr = 0 and ckr = πk} 3 ∗ ∗ ∗ (27) Πk = {r | hkr = 0 and ckr > πk} [r ∈ Rk, k ∈ C] .

2 By identification between the index sets in (27) and (22) we can see that K is a linear subspace when Πk is empty for all k ∈ C. When this holds for K at (x∗, y∗), we say that (x∗, y∗) is a strictly complementary solution. If it doesn’t hold we have a non-strictly complementary solution. An interpretation of a strictly complementary solution in terms of traffic is that, we have strict complementarity when there are no routes that are unused but still as good as those that are used. The point with the above discussion is that we can guarantee differentiability and hence the possi- bility of computing a Jacobian at points where we have strict complementarity. Actually we possibly have differentiability at more solutions than those that are strictly complementary. In [Pat04] this topic is discussed and conditions for differentiability at nonstrictly complementary solutions are presented. However, numerical experiments have shown that σ certainly is not differentiable at some points (see e.g., Figure 20) and we have to expect to meet this situation. Below follow two guidelines that have been used when the numerical methods have been imple- mented.

Guideline 1. (Subgradient at Differentiable Points) At differentiable points, the Clarke subdifferential is a singleton with the gradient as the only member and it can be computed by computing the directional derivative in the coordinate directions to assemble the Jacobian matrix.

Now we turn to the question of nondifferentiable points. The function σ is Lipschitz continuous. Hence Rademacher’s theorem can be applied.

Theorem 4. (Rademacher’s Theorem) Let Xˆ ⊂ Rn be open, and let f : Xˆ → Rm be Lipschitz. Then f is differentiable at almost every point in Xˆ.

For our purposes, the assumption that Xˆ is open constitutes no obstacle for us, since we will be able to extend our set X (i.e., decrease xL and increae xU by some constant  > 0) without breaking any assumptions. We now use Rademacher’s theorem to motivate the idea that whenever a Clarke subgradient is requested by a numerical method, we will still compute directional derivatives for all coordinate direc- tions and assemble a Jacobian matrix. By Rademacher’s theorem, it is very unlikely that an iterative numerical method happens to choose exactly one of the points in the measure-zero set that are nondif- ferentiable. One can object that the numerical precision of methods for TAP solutions will smooth out these measure-zero sets to become severe for the numerical algorithms. This will be seen as a numerical precision problem rather than a methodological problem. If we had a computer with infinite numerical precision, a numerical method would never hit a nondifferentiable point, except perhaps in the limit. Bundle methods are constructed under these conditions and can still be shown to converge globally to stationary points (see e.g., [BLO05; Haa04]). Nevertheless, one has to expect numerical difficulties close to nondifferential points. So, we will have the following implementation guideline for nondifferentiable points:

Guideline 2. (Subgradient at Nondifferentiable Points) A numerical method requesting a subgradient will with probability one be requesting it for a differentiable point, so no guideline is needed for nondiffer- entiable points.

The sensitivity analysis problem is in general a quadratic program of the following form when C is

16 translated into traffic assignment problem terms t and D:

∗ ∗ ∗ ∗ 0 T 0 1 X ∂ta(x , va) 0 2 minimize [∇xt(x , v )x ] v + (va) − (v0,d0) 2 ∂va a∈A −1 ∗ ∗ −1 ∗ ∗ 0 T 0 1 X ∂Dk (x , dk) 0 2 [∇xD (x , d )x ] d − (dk) , (28) 2 ∂dk k∈C v0 subject to ∈ K. d0

This problem is a quadratic reformulation of the generalized equation in Theorem 2 and can be inter- preted as a quadratic approximation of a perturbed traffic assignment problem in the vicinity of the solution (x∗, y∗).

6.5 Uniqueness and Strong Regularity in Practice If the design parameters don’t change the shape of the cost functions too much, then regardless of whether we use BPR cost functions or TU71 cost functions, elastic demands with the MNL-model or fixed demand, we always fulfill Assumption 1 for existence of solutions to the traffic assignment prob- lem. Assumption 3.(i) will also hold, again given that the design parameters are introduced nicely. What about uniqueness? And what about strong regularity of the traffic assignment solution mapping? That’s the questions about Assumptions 2 and 4. Our elastic demand model (MNL) does not introduce difficulties fulfilling Assumptions 2 or 4, and we can concentrate only on the cost functions. For the TU71 cost functions, we have a lower bound greater than zero ( 1 ) on ∂ta (see (9)); hence ca,1 ∂va both Assumptions 2 and 4 are fulfilled. ∂ta(va) Problems arise when using BPR cost functions, since = 0 for va = 0. Assumption 2 holds, ∂va but one can’t directly say that Assumption 4 is. Introducing OD connectors and centroids makes some cost functions constant, i.e., ∂ta ≡ 0. In this ∂va case, neither Assumption 2 nor 4 hold. In the case of MNL and TU71, the validity of the two assumptions could be determined immediately, since in the case of Assumption 2, the vector valued function

 t(v)  −D−1(d)

m is strictly monotone on the whole R ⊃ V . And for Assumption 4 we can say that ∇yC(x, y) is positive definite on the whole Rm ⊃ K − K. The sets V and K − K are defined by the structure of the traffic network. The development below will result in three theorems, which take the structure of the network into account to establish uniqueness of the traffic assignment solution and strong regularity for OD connectors and BPR functions. In the case of OD connectors, Theorems 6 and 8 are quite strong and can guarantee our two assumptions to hold. For the BPR functions, Theorems 8 and 9 shows that it is likely that strong regularity holds. A key concept here is forest from graph theory:

Definition 6. A graph G with m links, n nodes and c components (disjoint connected subgraphs) is called a forest if m − n + c = 0. This is equivalent to saying G contains no cycles.

Lemma 5. Let A0 ⊂ A so that the subgraphs G0k = (N , A0 ∩ Ak) of G are forests for all k ∈ C. Further, let 0 0 0 0 0 0 0 y ∈ span(V − V ), with y = (v , d ) = (v0, v+, d ), where the link flows have been reordered without loss of 0 0 generality, and v0 is the vector of link flows for links in A0, and v+ are the flows for the remaining links in A. 0 0 0 Then v+ = 0 and d = 0 implies v0 = 0.

0 0 |A(G1)| |A(G2)| Proof. Let y = (v0, 0, 0). Define as TG1→G2 : R → R the linear mapping that maps a link 1 2 1 flow vG1 on G to TG1→G2 (vG2 ) on G , so that TG1→G2 (vG1 ) and vG1 are equal for all common links A(G )∩ 2 2 2 1 A(G ) and TG1→G2 (vG1 ) is zero on all links only in G (A(G ) \A(G )).

17 We introduce the set   X  v = T (v ),   Gk→G k  ˜ m V = (v, d) ∈ R ∃vk ≥ 0 : k∈C      Mkvk = eidk − ejdk, ∀k = (i, j) ∈ C,

th where ei is the i unit vector and which is the set of all feasible link flows and demands allowing for cycles in the commodity flows. In V , cycle flows are absent thanks to the routes being cycle-free. Since all link flows and demands in V must fulfill the conservation of flow at each node we have that V ⊂ V˜ . We’ll continue to work with V˜ . Further,   X  v0 = T (v0 ),   Gk→G k  ˜ ˜ 0 0 m 0 span(V − V ) = (v , d ) ∈ R ∃vk : k∈C   0 0 0    Mkvk = eidk − ejdk, ∀k = (i, j) ∈ C.

0 0 ˜ ˜ Let M0k be the incidence matrix for the graph G0k. The set of possible y = (v0, 0, 0) in span(V − V ) then is   X  v0 = T (v0 ),   0 G0k→G0 0k  0 |A0| 0 v0 ∈ R ∃v0k : k∈C   0    M0kv0k = 0, ∀k ∈ C. 

From [Big74, Definition 4.4 and Theorem 4.5] we have that dim N(M0k) = m0k − n0k + c0k, where m0k is the number of links, n0k is the number of nodes and c0k is the number of components in G0k. Using the definition of a forest and the assumption that G0k is a forest, we can say that the dimension of the 0 0 null set is zero, i.e., N(M0k) = {0}, implying v0k = 0, and further (since T is linear) v0 = 0. Hence, y0 = (0, 0, 0).

We now turn to the special case of OD connectors, which are characterized by their constant (of- ten zero) cost functions. If OD connectors exist in the network, Assumption 2 is not fulfilled in general (since their cost functions aren’t separably strictly monotone) and uniqueness of a traffic assignment so- lution can not be directly established. However, if OD connectors are used as “intended”, Assumption 2 will still hold, given that the remaining cost functions and demand functions are strictly monotone. The following theorem states a condition for when OD connectors can be present in a network without breaking the property of uniqueness.

Theorem 6. (Uniqueness of Traffic Assignment Solutions for Well-Placed OD-Connectors) Let A0 = {a ∈ A | ta is constant}, and suppose that the subgraphs (N , A0 ∩ Ak) are forests for all k ∈ C. Further, let −1 Dk be strictly monotone for all k ∈ C and ta be strictly monotone for all a ∈ A+ = A\A0. Then Assumption 2 is fullfilled.

Proof. Let  t(v)  C(y) = , −D−1(d)

where y = (v, d) = (v0, v+, d), v0 are link flows on the links in A0 and v+ are the remaining link flows. This means the links might have been reordered (without loss of generality) to be able to make the 0 0 0 0 above decomposition. Let y2, y1 ∈ V , y1 6= y2 and y = y2 − y1 = (v0, v+, d ) ∈ V − V . We identify two cases:

0 0 (i) Suppose that at least one component of (v+, d ) is nonzero. This implies strict monotonicity,

T (C(y2) − C(y1)) (y2 − y1) > 0,

since the first |A0| components of C(y2)−C(y1) are zero, and the remaining components are strictly increasing.

0 0 0 (ii) Suppose y = (v0, 0, 0), and v0 6= 0. By Lemma 5 there is no other vector than the zero vector on 0 0 the form y = (v0, 0, 0) in span(V − V ) ⊃ V − V , hence there is no second case.

18 Uniqueness of the traffic assignment solution can hence be established for a network with OD con- nectors, given that they don’t form cycles in the subgraphs Gk. Figure 3 shows two examples of traffic networks with OD connectors. In both networks, we have two OD pairs, one going from centroid 1 to centroid 2, and one in the opposite direction. Figure 3(a) shows an example of well-placed OD connec- tors, for which Theorem 6 can be applied, while Figure 3(b) is an example of ill-placed OD connectors, since the OD connectors form a cycle in the undirected graph for at least one subgraph.

Full graph of tra c network I Full graph of tra c network II

Centroid 1 Centroid 2 Centroid 1 Centroid 2

Disaggregation Disaggregation

Subgraph for OD-pair 1: Subgraph for OD-pair 2: Subgraph for OD-pair 1: Subgraph for OD-pair 2: From centroid 1 to centroid 2 From centroid 2 to centroid 1 From centroid 1 to centroid 2 From centroid 2 to centroid 1

1 2 1 2

Undirected subgraph Undirected subgraph (happens to be identical for both OD-pairs) (happens to be identical for both OD-pairs)

(a) Well-placed OD connectors (b) Ill-placed OD connectors

Figure 3: Two examples of traffic networks with OD connectors. The OD connectors are the dashed links, while the solid lines are regular links with strictly increasing cost functions. In this example, the undirected subgraphs are identical for both OD pairs in both examples.

The next question concerns strong regularity of the traffic assignment solution mapping S when OD connectors and BPR functions are used. Here, another theorem, similar to the previous one, but based on the sufficient conditions for uniquness of solution to (20) in Assumption 4, will be presented. We will make use of the following lemma: Lemma 7. The linear subspace K −K at a solution y to the traffic assignment problem is a subspace of span(V − V ). ⊥ Proof. By definition, K = TV (y) ∩ C ; hence we have K ⊆ TV (y). The tangent cone is defined as

m TV (y) = {s(ˆy − y) ∈ R | yˆ ∈ V, s ≥ 0}, (29) and by s(V − V ) we’ll mean

m s(V − V ) = {s(y1 − y2) ∈ R | y1, y2 ∈ V, s ≥ 0}. ⊆ span(V − V ) (30)

Since y ∈ V we can identify y with y2 in (30) and hence see that TV (y) ⊆ s(V − V ). This implies

K − K ⊆ TV (y) ⊆ s(V − V ) ⊆ span(V − V ),

which was our statement.

19 Theorem 8. (Strong Regularity of Traffic Assignment Solutions for Zero-Derivatives) Let y = (v, d) be a

n ∂ta(va) o traffic assignment solution. Let A0 = a ∈ A = 0 and suppose that the subgraphs (N , A0 ∩ Ak) are ∂va −1 ∂Dk (dk) ∂ta(va) forests for all k ∈ C. Further, let > 0 for all k ∈ C and > 0 for all a ∈ A+ = A\A0. Then ∂dk ∂va Assumption 4 is fullfilled. Proof. Consider y0 ∈ K − K. We do the same reordering of link flow variables as was done in Lemma 0 0 0 0 0 0 0 0 5, i.e., y = (v , d ) = (v0, v+, d ), where v0 is the vector of link flows for links in A0, and v+ are the flows for links in A+. We identify two cases: 0 0 (i) Suppose that at least one component of (v+, d ) is nonzero. This implies

0T T 0 y ∇yC(y) y > 0,

since ∇yC(y) is diagonal, and the first |A0| diagonal elements are equal to zero, and the remaining, by assumption, are positive.

0 0 0 (ii) Suppose y = (v0, 0, 0). By Lemma 7, y ∈ K − K ⊆ span(V − V ), and Lemma 5 can be applied, 0 0 saying there is no other vector than the zero vector of the form y = (v0, 0, 0) in span(V − V ), and then of course not in K − K. Hence there is no second case.

From Theorem 8 we have that for well-placed OD connectors strong regularity holds as well. How- ever, if we are concerned with BPR functions in our network, there might be a chance that some links are unused so that we have a zero derivative in the cost function for those links, adding them to the set A0. Perhaps the unused links actually form cycles and Theorem 8 doesn’t save us from cycles of unused BPR function links. For the OD connectors, we knew which these links were beforehand. We don’t know in general what links are unused in a network, since the traffic assignment changes with design variables. What the theorem says though, is that if unused links, together with possible OD connectors, don’t form cycles, strong regularity holds. It is possible to strengthen this assertion. In Theorem 8, we used that K −K is a subspace of span(V −V ). Actually it is smaller than span(V − ⊥ ⊥ V ), since all vectors in K − K must be in C as well (K − K = (TV (y) − TV (y)) ∩ C ). The following theorem will make use of that fact to give us another argument for the possibility of strong regularity when using BPR functions. First, a definition of degenerate route:

Definition 7. Given a traffic assignment solution (h, c, π), a route r ∈ Rk is degenerate if hkr = 0 and ckr − πk = 0. If it is not degenerate, then it is nondegenerate. Theorem 9. (Strong Regularity of Traffic Assignment Solutions for BPR Cost Functions) Let (v, d, π, c) be a traffic assignment solution, with y = (v, d). Also, let A× = {a ∈ A | (ckr − πk) > 0 ∀k ∈ C, r ∈ Rk s.t. δkra = 1} (i.e., the set of links that are unused and belong to nondegenerate routes) and let A0 = n o −1 ∂ta(va) ∂Dk (dk) ∂ta(va) a ∈ A \ A× = 0 . Further, let > 0 for all k ∈ C and > 0 for all a ∈ A+ = ∂va ∂dk ∂va (A\A0) \A×. If the subgraphs (N , Ak ∩ A0) are forests for all k ∈ C, then Assumption 4 is fulfilled. Proof. Consider y0 ∈ K − K. We do a similar reordering of link flow variables as was done in Lemma 0 0 0 0 0 0 0 0 0 5: y = (v , d ) = (v0, v×, v+, d ), where v0 is the vector of link flows for links in A0, v× are the link flows 0 for links in A× and v+ are the flows for links in A+. We identify three cases: 0 0 (i) Suppose that at least one component of (v+, d ) is nonzero. This implies

0T T 0 y ∇yC(y) y > 0,

since ∇yC(y) is diagonal, and the first |A0| + |A×| diagonal elements are equal to zero, and the remaining, by assumption, are positive.

0 (ii) Suppose that at least one component of v× is nonzero. Let link a ∈ A× be a nonzero-flow link. v0 = P P δ h0 6= 0 r k h0 6= 0 We have a k∈C r∈Rk kra kr implying that there exists an and s.t. kr and 0 ⊥ (ckr − πk) > 0 (from the definition of A×). This is not compatible with y ∈ K − K ⊂ C , since 0 0 ⊥ hkr(ckr − πk) = 0 for y ∈ C (see (21)). Hence, there is no case (ii).

20 0 0 0 (iii) Suppose y = (v0, 0, 0, 0). By Lemma 7, y ∈ K − K ⊆ span(V − V ), and Lemma 5 can be applied, 0 0 saying there is no other vector than the zero vector on the form y = (v0, 0, 0) in span(V − V ), and then of course not in K − K. Hence, there is no case (iii).

What Theorem 9 says is that S is strongly regular at x if, in the traffic assignment solution S(x), the links with zero cost derivative that are also parts of cheapest routes (i.e., routes that are as cheap as the cheapest route in its OD pair) don’t form cycles. In the case of only BPR functions (without OD connectors), we only have zero cost derivatives if a link is unused. This means that the routes that the unused links are parts of need also to be unused and in addition be cheapest (i.e., degenerate) to be included in A0. Since all our routes are cycle-free, parts of at least two degenerate routes are needed to form a cycle of unused links and cause strong regularity break. Adding OD connectors in conjunction with BPR function makes it easier for strong regularity to break, since unused links to only one degenerate route can connect to the OD connectors and form a cycle, which still can be considered as improbable for realistic networks.

v1=10 Node 1 Node 2

2 6+(v3) 2 4+(v2) 2 cycle 3+(v5) +ε

2 3+(v4)

Figure 4: A traffic network with two nodes and one OD pair from node 1 to node 2. The cost functions of the links are given as labels to the links, which also define the link numbers through the subscripts of the link flow variables v. The demand is 10, and only link 1 is used. The dashed lines correspond to links with zero-derivative of the cost function in the current state.

Example 3. Figure 4 shows a network with one OD pair from node 1 to node 2. The links are labeled with their respective cost functions as functions of the link flow v and a parameter  ≥ 0. There are three routes: (1), (2, 3), (2, 4, 5). The demand is 10 and hence only link 1 is used, since the cost of the other routes is at least 10. If  = 0, then the routes (2, 3), (2, 4, 5) are both degenerate, since they are unused and has cost 10 (which is equal to the minimum cost of any route for that OD pair). The links (2, 3, 4, 5) of these routes go into A0 from Theorem 9 above, since they are not in A× and the derivatives of their cost functions are zero. In this case, links 3, 4 and 5 form a cycle and hence the graph (N , A0) is not a forest. However, for  > 0, route (2, 4, 5) is not degenerate anymore and hence link 4 and 5 goes into A×, thereby excluding links 4 and 5 from A0. Then there are no cycles in the graph (N , A0), and Theorem 9 can be applied to establish strong regularity of the traffic assignment solution. With the above discussion in mind, strong regularity will be assumed to hold at all points for all problems considered below: All networks with OD connectors fulfill Theorems 6 and 8. Further, it will be considered improbable that strong regularity will fail to hold for the problems with BPR functions.

6.6 Existence of Solutions to MPEC In previous sections the properties of the traffic assignment problem have been studied in detail. Now we turn to the problem to minimize F (x, y), (x,y) subject to x ∈ X, (31) y ∈ S(x), and determine when it has solutions. We recite [OKZ98, Proposition 1.2] that is sufficient for our needs: Theorem 10. Let F be continuous, X be compact and S be single-valued and continuous on an open set con- taining X. Then (31) possesses a solution pair (x∗, y∗).

21 We will only work with continuous objective functions F and X is defined by simple bounds, hence compact, and it will be possible to extend X slightly, so that S can be strongly regular on an open set containing X.

6.7 Stochastic MPEC Suppose that some of the parameters describing the data functions (cost, demand and objective) are subject to stochastic factors. This is a way of modeling unpredictable phenomena, such as weather conditions or fluctuating demand. In such cases, one might want to minimize the expectation value, or some measure of value at risk. We will here consider a stochastic extension of the MPEC called Stochastic MPEC (SMPEC). The main purpose of this section is to motivate the validity of a discretization scheme for SMPEC problems in the context network design. The development here is a matching of the theory in [CP10a] for our special problem, but with slightly weakened assumptions. First, two versions of the SMPEC problem will be presented: expectation value problem and conditional value-at-risk problem. Then our assumptions will be presented followed by a short review of the robustness of the SMPEC problems. Finally, discretization schemes based on the Monte Carlo Sample Average Approximation method and Simpson’s quadrature rule are presented both for the expectation value problem and the conditional value-at-risk problem. Let the complete probability space (Ω, Θ,P ) describe our stochastic factors. We might want to min- imize the expectation value of an objective function under uncertainty: Z (SMPECΩ) minimize F (x, y(ω), ω)P (dω), (x,y) Ω subject to x ∈ X, y(ω) ∈ S(x, ω),P -a.s., where P -a.s. means P -almost surely, i.e., with probability one. Another objective is to minimize the social cost for the worst scenarios. We introduce value-at-risk at probability level β (β-VaR(x)) as the value for which the probability that F will not exceed this value is β, i.e., β-VaR(x) = min{γ | P (F (x, y(ω), ω) ≤ γ) ≥ β}. Another measure based on β-VaR(x), which is more tractable as objective, is called conditional value-at- risk and is now introduced as 1 Z β-CVaR(x) = F (x, y(ω), ω)P (dω). 1 − β F (x,y(ω),ω)≥β-VaR(x)

Letting β → 1 makes β-CVaR(x) → supω∈Ω F (x, y(ω), ω), i.e., the maximal value of F for design x over all possible scenarios. For β equal to 0 we have that β-CVaR is equal to the expectation value of F . In [CP10a; RU00] is given an alternative expression for conditional value-at-risk based on the func- tion 1 Z Gβ(x, y, γ) = γ + [F (x, y(ω), ω) − γ]+P (dω), 1 − β Ω where [s]+ = max{0, s}. The conditional value-at-risk then is

β-CVaR(x) = min Gβ(x, y, γ). (32) γ

Since γ is convex and the integrand in the integral in Gβ is convex in γ, also Gβ is convex. The derivation of (32) is found in [RU00]. Finally, replacing the expectation value in the objective by conditional value-at-risk we get the prob- lem to 1 Z (SRPECΩ) minimize Gβ(x, y, γ) := γ + [F (x, y(ω), ω) − γ]+P (dω), (x,y,γ) 1 − β Ω subject to x ∈ X, y ∈ S(x, ω),P -a.s.

22 Now we turn to the questions of existence of solutions to (SMPECΩ) and (SRPECΩ), and also the questions of stability of optimal solutions and stationary points with respect to perturbations in proba- bility distribution P . Theorems from [CP10a] on existence and robustness of solutions to (SMPECΩ) and (SRPECΩ) can be applied when the following assumptions hold. (The letters and numbers in parenthesis indicate the corresponding assumptions in [CP10a]. Also note that the list of assumptions below is a merge of smaller lists in [CP10a]):

Assumption 5. (Stability of Stochastic MPEC I – General)

5.(i) The mapping S(x, ·) is measurable for every x (A1).

5.(ii) The set X is closed and the mapping x 7→ S(x, ω) is closed for almost every ω ∈ Ω (A2).

5.(iii) The function F is Lipschitz continuous in (x, y), measurable in ω, uniformly weakly coercive with respect to x over the set X, and bounded from below by a (Θ,P )-integrable function (A3, B1).

5.(iv) The set S(x0, ω) is nonempty for some x0 ∈ X and almost every ω ∈ Ω (A4).

5.(v) The mapping S(·, ω) is single-valued and Lipschitz continuous for each ω ∈ Ω (B2).

5.(vi) The mapping C(·, ·, ω) is continuously differentiable and C(x, ·, ω) strictly monotone on V for each x ∈ X and ω ∈ Ω.

n 5.(vii) X = {x ∈ R | gi(x) ≤ 0, i = 1, . . . , p} and each function gi is continuously differentiable (B3).

5.(viii) The Mangasarian-Fromovitz constraint qualification (MFCQ) holds for all x ∈ X (B4).

5.(ix) The set X is bounded and convex (C1).

Note that Assumption 5.(v) doesn’t directly correspond to B2 in [CP10a], but B2 is a bit too strong for its purpose and is only there to establish single-valuedness and strong regularity of S in Theorems 3.2, 4.2, 5.1 and 5.2 of [CP10a]. From Theorem 2.1 in [CP10a], solutions to (SMPECΩ) and (SRPECΩ) exist under Assumptions 5.(i), 5.(ii) and 5.(iii). Let Pk be a sequence of probability distributions that converges weakly to P . For each Pk we define k k the SMPEC problems (SMPECΩ) and (SRPECΩ) , which are (SMPECΩ) and (SRPECΩ) respectively with P replaced by Pk. Under Assumption 5, Theorems 3.1, 3.2, 4.1 and 4.2 in [CP10a] establish stability of optimal solutions and stationary points to both (SMPECΩ) and (SRPECΩ), by showing that optimal solutions and Clarke k k stationary points of (SMPECΩ) and (SRPECΩ) converge to optimal solutions and Clarke stationary points of (SMPECΩ) and (SRPECΩ), respectively. The theorems are not recited here, because they are just an intermediate step in our aim towards the discretization schemes. A discretization approach that can be used is the Monte Carlo technique known as Sample Average Approximation (SAA), which makes it possible to approximate (SMPECΩ) and (SRPECΩ) numerically, by replacing the continuous probability distribution P with a discrete distribution generated through sampling from P . SAA is a so called exterior approach to solving a stochastic optimization problem. This means the samples are taken outside of the optimization problem, or outside the method for solving it. The prob- lem is hence transformed from a stochastic problem to a deterministic problem, and one can use deter- ministic methods for solving it, that are unaware of the original stochastic nature of the problem. The alternative to the exterior approach is the interior approach, which means the stochastic variables are not realized before the solution procedure starts. Instead the solution procedure is specially made for stochastic problems and draw the samples during the solution process ([Sha03]). The convergence rate − 1 for the objective function value of an SAA scheme is O(N 2 ) which is typical for Monte Carlo methods. Note that this convergence rate is independent of the number of stochastic factors ([Sha03]).

23 The SAA discretization is performed by drawing N independent samples ω1, . . . , ωN from P and solving a deterministic problem:

N N 1 X k k (SMPEC) minimize FˆN := F (x, y , ω ), (x,(y1,...,yN )) N k=1 subject to x ∈ X, yk ∈ S(xk, ωk), k = 1,...,N, or N N 1 1 X k k (SRPEC) minimize GˆN := γ + [F (x, y , ω ) − γ]+, (x,(y1,...,yN ),γ) 1 − β N k=1 x ∈ X, subject to yk ∈ S(xk, ωk), k = 1,...,N, depending on whether we want to solve an (SMPECΩ) or (SRPECΩ). The following two theorems correspond to Theorems 5.1 and 5.2 in [CP10a] respectively and establish the convergence of solutions N N of (SMPEC) and (SRPEC) to solutions of (SMPECΩ) or (SRPECΩ) as N → ∞. Theorem 11. (Convergence of Optimal Solutions to (SMPEC)N ) Let Assumption 5 hold. For each N, let N (xN , yN ) be an optimal solution to (SMPEC) . Then, each limit point (there is at least one) of the sequence {xN } is an optimal solution to (SMPECΩ). Starting from the bottom of the list of assumptions in Assumption 5, we can directly say that for our box bounded problem (recall the definition of X in Section 6), Assumption 5.(vii–ix) hold. Further, if our deterministic assumptions Assumptions 1, 2, 3 and 4 hold for every ω ∈ Ω, then Assumptions 5.(ii,iv–vi) hold as well. In this work, only uniform distributions will be used, and hence Ω is compact. Assumption 5.(i) can be then asserted if further ω is regarded as a perturbation parameter and S(·, ·) is strongly regular. This requires C(x, y, ·) to be continuously differentiable w.r.t. ω. These two condi- tions make S(x, ·) Lipschitz continuous in ω and hence measurable over the compact set Ω. For Assumption 5.(iii), Lipschitz continuity in (x, y) of F is given from Assumption 3.(ii), and weak coercivity from the compact set X. Again, using that we are only concerned with uniform distributions, we have the compactness of Ω, and if we assume that F (x, y, ·) is continuously differentiable w.r.t. to ω, F is measurable in ω and also bounded from below by a constant. We summarize the above discussion on assumptions with yet another assumption definition, that can replace Assumption 5 for our purposes: Assumption 6. (Stability of Stochastic MPEC I – Specialized) Assumptions 1, 2, 3 and 4 hold for every ω ∈ Ω. Furthermore, Ω is compact, and C(x, y, ·) and F (x, y, ·) are continuously differentiable w.r.t. ω. For the convergence of stationary solutions, we define another assumption. Assumption 7. (Stability of Stochastic SMPEC II) The function F (·, σ(·, ω), ω) is regular (i.e., F is direc- tionally differentiable and the directional derivative coincides with the Clarke directional derivative) at x for almost every ω ∈ Ω.

Theorem 12. (Convergence of Stationary Solutions to (SMPEC)N ) Let Assumptions 5 and 7 hold. For each N N, let (xN , yN ) be a stationary solution to (SMPEC) . Then, each limit point (there is at least one) of the sequence {xN } is a stationary solution to (SMPECΩ). The above theorem has a major drawback: It relies on Assumption 7 which asserts regularity of F . This assumption is broken in our applications. We will more or less ignore that we don’t fulfill that assumption based on the following argumentation. The assumption is a sufficient condition for convergence of stationary solutions, but not necessary. If for an N, a local method has found a local minimum, we’ll assume that the local minimum is an approximation of a local minimum of the true problem. A heuristic motivation for this is that the discretized objective function converges uniformly to the true stochastic objective function, and the cases where a local minimum vanishes as N → ∞ are considered unusual or pathological if N is large enough.

24 The two above theorems are also applicable to the conditional value-at-risk problems (SRPECΩ) and (SRPEC)N ([CP10a]). For problems with only one stochastic factor with uniform distribution, there are more efficient ways to compute the integral in the objective function than doing Monte Carlo sampling. If the objective function is smooth in ω, one can use Gaussian quadrature for computing the integral. As an example, Simpson’s rule has convergence rate O(N −4) if we have only one stochastic factor. Simpson’s rule is to approximate an integral by the following quadrature rule:

Z b h f(x) dx ≈ [f(x1) + 4f(x2) + 2f(x3) + 4f(x4) + 2f(x5) + ··· + 4f(xN−1) + f(xN )] , a 3 where N ≥ 3 is an odd integer, and h = (b − a)/(N − 1). In our numerical experiments, we will deal only with one stochastic factor and Simpson’s rule will be used to compute the above integrals. In the case of conditional value-at-risk, we have points with discontinuous derivative (where F equals γ), and this might decrease the convergence rate of the Simpson’s rule. Note that the discretied conditional value-at-risk problem, for high values of β only a fraction of all scenarios are actually used in the evluation of the objective function. The effective level of discretization then is lower than it is for lower values of β.

6.8 Toll Pricing Problem (TP) The problem of determining toll prices to reduce congestion is called the toll pricing problem. The idea is that a toll fee is charged the users travelling through tolled links, or toll gates. If the toll prices and toll gate locations are well chosen, the travel pattern of the users change so that congestion is reduced. As described in Section 4 the user response in a road traffic system is modeled using Wardrop’s principle. The resulting traffic assignment is the user equilibrium, where the users choose their route in a selfish manner, which might lead to over-utilization of some road links and lead to a suboptimal flow solution in the view of society (see [BMW56, Section 4.2.2]). The marginal social cost of an additional driver to an already congested road is higher than the private cost for that additional driver. A toll fee is not considered a social cost since the toll fee collector is society itself. By imposing tolls on the road users that contribute more to the social costs than their private costs by their driving pattern, the traffic pattern can be changed to increase the social surplus and tend to the system optimal solution. Our objective function F in the toll pricing problem is the negative social surplus introduced in Section 4.3. However, we cannot include the toll fees as part of social cost, since the toll fees are collected by society. The total link travel cost on a link a, including toll, is given by

X τi tˆ (v , τ ) = t (v ) + w , a a a a a ai β i∈I VOT where τ are the toll prices, replacing the design parameters x in the general traffic network design problem. A matrix W = (wai)|A|×|I| is introduced. This matrix allows several toll gates to have the same toll fee. We call a set of such toll gates toll gate group. Toll gate groups are indexed by i where i ∈ I (the set of all toll gate groups) and each toll gate belonging to toll gate group i has toll fee τi. The elements of W are defined as: ( 1, if there is a toll gate from toll gate group i on link a, wai = 0, otherwise.

βVOT that occurs in the denominator is the value of time (VOT). Link travel cost is usually measured as a time quantity, but a toll fee is a monetary quantity. The concept value of time is therefore introduced to convert money to time. Different people value time differently, and for a realistic traffic model this has to be taken into account. It can be done by considering several classes of users with different VOT values. An equilibrium based on several user classes is called multi-class traffic equilibrium. See [HY08] for an example of multi-class modeling. Again, this work is based on simplifying assumptions, and only one user class is considered.

25 In all numerical examples the toll fee will be given in quantity time (βVOT = 1). The objective function, which is a measure of the negative social surplus, becomes

Z dk X X −1 F (τ, v, d) = SC(v) − UB(d) = ta(va)va − Dk (ω)dω. (33) a∈A k∈C 0

Note that there is no hat (ˆ) on t in the social cost term, since we don’t include toll fees in the social cost. It is important to note though, that when computing the user equilibrium, the cost functions including toll fees (tˆ) must be used. Also, recall that v and d are implicit functions of τ. In order to be able to use the theory on stability presented in Section 6.3, this network design problem has to fulfill Assumption 3. This can easily be verified, since both C and F are continuously differen- tiable w.r.t. both τ and y if using BPR or TU71 functions. A stochastic extensions of the toll pricing problem, STP, will be considered in the numerical experi- ments, but only for instances when TU71 cost functions are used. In STP, the stochastics enter as a single random variable replacing the capacity ca,3 for a in a subset of A. The replacing variable is

cˆa,3 = ca,3ω, (34)

where ω ∈ U(1 − u, 1 + u). We need to check that our discretization schemes still are stable, and if u < 1, we can see that Assumption 6 holds, since for each ω ∈ Ω = U(1 − u, 1 + u), we have just changed the capacity of a link, and the capacity is positive. Further, C and F are differentiable w.r.t. ω. The TU71 cost function then has the following form:

  ca,4  va va ta(va) = + ca,2 1 + + ca,0. ca,1 ca,3ω

If we can toll only a subset of the links, or if we don’t have the freedom to toll different links in- dividually, i.e., the matrix W is not the identity matrix, we say we are concerned with the problem of second-best toll pricing. On the other hand, if we can toll each link individually, i.e., the matrix W is the identity matrix, we have a first-best toll pricing problem. With first-best toll pricing it is possible to achieve a system optimal flow solution as a user equi- librium by imposing the correct toll on each link. The reason this is possible is that the tolls are not considered social costs, but still affect the driving pattern. One way of achieving first-best tolls is by marginal-cost pricing. Since all links are tollable, we talk about a toll vector which is the vector τ = (τ1, . . . , τ|A|) of individual link tolls. It is well known that if we impose the tolls (SO) (SO) ∂ta(va ) τa = va (35) ∂va (SO) on each link a, where va is the system optimal flow on link a, the user equilibrium is exactly the system optimal traffic assignment. In [LP98], it is shown that there might exist several toll vectors giving rise to a system optimal flow solution, in addition to the marginal-cost toll vector. In [YL09] such toll vectors are called valid toll vector, and all valid toll vectors hence are solutions to the first-best toll pricing problem. Valid toll vectors still have the property that the user of a route is tolled the additional costs that user inflicts on all other users by using that route. Hence, for first-best toll prices the tolls need to be valid only on route-level, not necessarily on link-level. This fact says that the first-best toll pricing problem might have several local minima.

6.9 Capacity Expansion Problem (CEP) The second network design problem considered is the problem of deciding to what extent the capacity of a number of predetermined links in the network has to be increased to give maximum social surplus. Improving the capacity of links is bound to an investment cost, ruling out the solutions that make the capacities arbitrarily large.

26 The capacity expansions ρ are our design parameters, replacing x in the general network design problem. The objective function for this problem is negative social surplus plus the investment costs:

F (ρ, v, d) = SC(v, ρ) − UB(d, ρ) + φ(ρ)

Z dk X X −1 X (36) = ta(ρ, va)va − Dk (ω)dω + φi(ρi), a∈A k∈C 0 i

where φi(ρi) is the investment cost function, which is separable in the individual expansions parameters ρi in the vector ρ = (ρi)i. Each ρi corresponds to an expansion group, indexed by i ∈ I. An expansion group is a concept analogous to that of toll gate groups for the toll pricing problem, and is a set of links. All links in an expansion group i share the same expansion parameter ρi. A matrix W will be used to couple the expansion parameters to their links, and is defined as follows: ( 1, if link a is in expansion group i, W = (wai)|A|×|I| = 0, otherwise.

For some problems the expansion groups contain only one link. In Section 5 two classes of cost functions are presented: the BPR functions and the TU71 functions. Only the BPR functions will be used when considering the capacity expansion problem. The BPR func- tion (8) is modified and have the appearance below for this problem:

 ba,4 va ta(va) = ba,1 + ba,2 . ba,3 + W ρ

This network design problem fulfills Assumption 3 given that the functions φi are directionally differentiable. Some of the problems considered are stochastic, and in our numerical experiments for stochastic CEP (SCEP), it’s always the total capacity (i.e., expansions included) that is a stochastic variable. This ˆ means that ba,3 will remain as it is while ba,2 is replaced by stochastic variable ba,2 for a in a subset of A:

ˆ −ba,4 ba,2 = ba,2ω , (37)

where ω ∈ Ω (see Section 6.7) is a uniformly distributed variable with ω ∈ U(1−u, 1+u), and u is a vari- able determining the magnitude of the possible deviations. The variance for this uniform distribution 1 2 is 3 u . The full cost function

 ba,4 va ta(va) = ba,1 + ba,2 (38) ω(ba,3 + W ρ)

then has total (original plus expanded) capacity as a stochastic variable. For our discretization schemes to remain robust, Assumption 6 must hold, which it does to the same extent as without the stochastics added if u < 1. (Recall that strong regularity isn’t guaranteed for the BPR functions, see Section 6.5.) Using the implicit programming formulation where F = F (ρ), we can compute the derivative of F (ρ) at points ρ where it is differentiable:   ∂F (ρ) X X ∂ckr ∂hkr X −1 ∂dk ∂φj = hkr + ckr − Dk (dk) + . ∂ρj ∂ρj ∂ρj ∂ρj ∂ρj k∈C r∈Rk k∈C

From the conservation of flow we have P ∂hkr − ∂dk = 0 for all k ∈ C. Combining that with the r∈Rk ∂ρj ∂ρj −1 fact that Dk = ckr = πk for all k ∈ C and r ∈ Rk makes it possible to cancel some terms and we have

∂F (ρ) X X ∂ckr X ∂φi X ∂ta ∂φj = hkr + = va + . (39) ∂ρj ∂ρj ∂ρj ∂ρj ∂ρj k∈C r∈Rk i a∈A

27 7 Approach I: Implicit Programming

The first approach to solving the network design problem, deterministic as well as stochastic, is based on the fact that we can compute the user equilibrium solution evaluating a function σ and hence in- corporate the user equilibrium implicitly in the problem formulation through σ. The network design problem (16) is solved as if it was a nonlinear programming problem in the design variables only. If we view (16) as a nonlinear programming problem, solvers for that problem usually need to be able to compute the objective function value and gradient. Evaluation of F (x) requires computation of σ(x), i.e., the solution to the traffic assignment problem. Further, in order to compute the gradient ∇F (x), we need ∇σ(x), which can be computed through sensitivity analysis of the traffic assignment solution (Section 6.4), which in turn requires route flow information from the traffic assignment solu- tion. The discretized stochastic problem (SMPEC)N or (SRPEC)N can also be regarded as a nonlinear programming problem on the form (16), but requires several traffic assignment problems to be solved for different values of the stochastic parameters. For the solution of the problem, we will consider both local and global methods. Consequently, for this approach we need solvers for • the traffic assignment problem (3), • the sensitivity analysis problem (28), • local and global solution of the network design problem (16),

• local solution of the discretized stochastic network design problems ((SMPEC)N and (SRPEC)N ). The first subsection, Section 7.1, begins at the bottom and describes how the traffic assignment prob- lem is solved, and Section 7.2 discusses the solution of the sensitivity analysis problem for gradient computation. Section 7.3 describes how the stochastic extension of the network design problem is han- dled. Section 7.4 deals with local optimization methods, while Section 7.5 discusses global optimization methods.

7.1 Solving the Traffic Assignment Problem The traffic assignment problem is the core of the implicit equilibrium formulation approach. The objec- tive function is dependent on the link flow and demand solution from the traffic assignment problem. The traffic assignment solution, and then also the route flow data, is needed to compute a subgradi- ent according to the theory developed in Section 6.4. Besides that, a very accurate traffic assignment solution is needed to compute an accurate subgradient approximation.

• FW - The Frank-Wolfe method ([FW56]) has been widely used to solve the traffic assignment problem. It is known for having bad convergence properties and it doesn’t return route flow data. Hence, this method was not chosen to be used in this work. • DSD - The disaggregate simplicial decomposition (DSD) algorithm ([LP92]) has been implemented in [Jos03] previously for solving the traffic assignment problem. It works in route flow space and hence can return route flow data. The convergence is better than for the Frank-Wolfe, but it still has problems for larger problems. This algorithm was chosen to be used in this work. • OBA - A third algorithm is the Origin-Based Algorithm for the Traffic Assignmen Problem (OBA) described in [BG02]). It works with route flows and has shown very good convergence properties. • TAPAS - A fourth algorithm is Traffic Assignment by Paired Alternative Segments (TAPAS) de- scribed in [BGbl], which also works with route flows and show even better convergence proper- ties.

Only the DSD algorithm will be used and described further. The algorithms by Hillel Bar-Gera (OBA and TAPAS) would probably be better choices though. In [LP92] disaggregate simplicial decomposition (DSD) is presented as an algorithm for solving the traf- fic assignment problem with fixed demand. It is a method utilizing a column generation step to generate

28 new variables, or possible routes for each OD pair, when they are needed. After the columns are gener- ated, a restricted master problem (RMP) is solved which in itself is a traffic assignment problem, but with a fixed and restricted set of routes. To solve the RMP, the variables for each OD pair are represented separately for finding a search direction, and the is done in link flow space.

Route based. Important to our application is that the DSD algorithm is a route based solver, from which we can obtain route flow information for a solution. This is necessary for us to solve the sensi- tivity analysis problem discussed above.

Elastic demand. Our aim here is to adapt this algorithm to handle elastic demands. The column generation step can be left untouched. It finds and adds routes that are cheaper than those already present in the RMP. If the RMP terminates with an equilibrium solution and no more cheapest routes can be found, the full traffic assignment problem has been solved, even if the demands are elastic. This means we only need to consider the RMP here. We will use the excess demand approach to handle elastic demand in the solver. Consider problem (3). It can be changed to an excess demand variable through the following transformation

max ek = dk − dk, ∀k ∈ C,

max where dk must be chosen large enough. We introduce a new function Wk(ek) as

−1 max Wk(ek) = Dk (dk − ek).

Eliminating the link flow variables, the transformation and the new function yields the following equiv- alent formulation of problem (3):

P P Z k∈C r∈R δkrahkr Z ek X k X minimize z(h, e) = ta(ω)dω + Wk(ω)dω, (h,e) a∈A 0 k∈C 0 h ≥ 0, subject to (40) e ≥ 0,

X max hkr + ek = dk , ∀k ∈ C.

r∈Rk

The non-negativity constraints of d are tacitly enforced through the non-negativity constraints of h and the equality constraints. This formulation has an intuitive interpretation. It can be interpreted as a fixed max demand problem, where for each OD pair k we have the demand dk and a route between the origin and destination with cost function Wk. max The choice of dk can be done by first computing the car travel cost on a totally unloaded and empty network, i.e., with zero flow, which gives the cost πmin. The flow at the excess demand link can then not max exceed dk . Figure (5) illustrates the argument. With this formulation we can almost directly apply the DSD algorithm. What we need to do is to add the excess demand links separately as new links and routes, and also have a mechanism for automatically setting the demands from within the code. Also support for two kinds of cost functions (regular cost functions and inverse demand functions) need to be implemented.

Termination criterion. As termination criterion for the RMP in the DSD algorithm is a test on the relative duality gap RDG from Section 4.4:

RDG < DSD.

If this is fulfilled, the RMP is considered solved. If it is not possible to find any new routes in the network that are cheaper than those that already exist, the DSD algorithm terminates. Otherwise, those are added and a new RMP is solved.

29 sinn h otst ne eswr tested: were sets index to routes the assigning r esaedfie n(7 n iciiaerue ae nterrueflwadecs ot nanumerical a In cost. excess and flow route their on based routes solution sensitivity discriminate TAP and purpose. the (27) that in for for defined performed are algorithm sets and prepared solution not From of was experiments. experiment choice a all little hence optimal in this and since problem the task drawn, analysis the regarding be sensitivity for should conclusion the reliable more solving no also for experience and used faster this been was has solver solver QP general QP the general using that consider. noted to was the alternative it during an solvers be TAP definitely than would more solver developed QP been a also using have the and solvers used. years, for QP been solver General has TAP linear. flows As are route can negative constraints possesses. solvers for TAP TAP modified a algorithm hand, that DSD other the constraints the work, the this On of gen- in structure property. problem in special sensitivity can that the and of specially, of advantage function advantage objective take take quadratic modification, of without case the not, handle eral doesn’t solver a TAP as general it A solve to whether of question route The arises. negative problem. immediately with programming problem, then quadratic assignment QP a traffic or TAP also a is of it form but the allowed, in flows is (28) problem analysis Problem sensitivity Analysis The Sensitivity the Solving 7.2 ueB: Rule 5: Figure n ueC: Rule and hudbln o ic h ento n(7 sue h A ssle efcl.Trerlsfor rules Three perfectly. solved is TAP the assumes (27) in definition the since to, belong should ueA: Rule ti loncsayt eemn h ne sets index the determine to necessary also is It (SQOPT), solver QP a and (DSDTAP) solver assignment traffic our the with that experimenting and some quadratic, is After function objective the that use hand other the on can solver QP general A flows. route negative handle to able be to needs solver TAP that solver, TAP a use to choose we If h ot hog h eadfunction. demand the through cost) the As d k max ( h ∗ a pe ii ftedmn)i ae h eadta smpe from mapped is that demand the taken is demand) the of limit upper (an c , ∗ π , ∗ Π [ Π Π ) r ecnhave can we k 1 k 3 k 2 R ∈ Π Π Π [ r = = = k 1 k 3 k 2 R ∈ { { { = = = k r r r k , { { { | | |

k Cost r r r h h h k , C ∈ kr ∗ kr ∗ kr ∗ | | | h c h C ∈ kr ≤ ≤  > kr ∗ kr ∗ ] L o , h ( [ Π Π Π w   kr ∗ r x > 0 = ] fuzz fuzz fuzz er k 3 k 2 k 1 , ∗ R ∈ (

h , 0 c = = = c o kr ∗ d d d s ∗ and and k ∗ k ∗ k ∗ { ∅ { t ) k

lim r , r − k , π > and and and | | i π t h h c C ∈ k ( ∗ 30 kr ∗ kr ∗ c kr k ∗ ) Π kr c c c ( > k 1 ≤  > and ] ( kr kr kr x , x , ∗ ∗ Π h ( ( ( 0  h , ,h x x x kr ∗ fuzz fuzz k 2 n ehv odcd hc ne e route set index which decide to have we and ∗ ∗ ∗ ∗ ∗ c ) h , h , h , and kr − = ) d d π k ∗ k ∗ ∗ ∗ ∗ ( Upper demand limit x k ∗ } } ) ) ) ) ∗ h , , π Π − − − ,h kr ∗ D ≤ k ∗ k 3 e ∗ π π π } m ) 1 , o l Dpairs OD all for a k k k ∗ ∗ ∗ − n } d π ≥  <  < , k ∗  > fuzz fuzz fuzz 1 } π π π , k k k ∗ ∗ ∗ } } } , , , k π k min C ∈ alwrlmtof limit lower (a hs index These . (43) (41) (42) where fuzz > 0 is wisely chosen. Figure 6 illustrates how routes are assigned according to the three rules. Rule A (41) is motivated by the fact that the set of non-strictly complementary solutions is a null- set in the entire feasible set, and we have strict complementarity w.p.1 if a point is selected randomly. Therefore, if we would guess which index set a certain route actually belongs to, it is probably one of 1 3 Πk or Πk. Rule B (42) introduces a “fuzz”-factor fuzz > 0 (as in [Jos03]) which separates the route flow- 2 excess cost space into three regions. This assignment makes more routes go into Πk and can be seen as a way of smoothing out the sets of degeneracy. Rule C is based on a heuristic described in [Pat04]. In Rule A it is important that the variables are properly scaled, since the ratios involved in the defi- nitions are not dimensionless, and in Rule B and C we are instead faced with the question of what value of fuzz to choose. In Section 9.3 an experiment is performed where the different rules are compared.

ecr ecr [1] ecr [1]

3 Π Π3

Π3 Π1 Π1 εfuzz Π2 Π1

2 Π hr εfuzz hr [1] εfuzz hr [1]

(a) Rule A (41) (b) Rule B (42) (c) Rule C (43)

Figure 6: The x-axis is route flow, while the y-axis is route excess cost. Each route in a perfect TAP solution would correspond to points at the positive x- and y-axes. However, in an approximate numerical solution routes ∗ ∗ ∗ not always fulfill hkr(ckr − πk) = 0 and those routes correspond to points within the first quadrant. In 2 (a) we have index set assignment Rule A (41) illustrated. Routes at the origin belongs to Πk, while routes 1 3 below or at the dashed line belongs to Πk and routes above it to Πk. In (b) and (c) Rule B (42) and Rule C (43) respectively are illustrated. Note that the axes in (b) and (c) are scaled separately for each OD pair by ∗ ∗ (dk, πk) (indicated by the symbol [1] for dimensionless quantity).

7.3 Solving the Stochastic Extension

The stochastic extension (SMPEC) is solved by, instead of using F as the objective function, using the discretized stochastic couterpart FˆN or GˆN from Section 6.7, depending on whether we are working with the expectation value or conditional value-at-risk objective function. This requires N function val- ues (scenarios) σ(x, ωi) to be evaluated, and they are independent and can be computed in parallel. We are using Guidelines 1 and 2 here, and will denote by ∇f(x) the gradient composed by the directional derivatives, even at nondifferential points! Computation of ∇FˆN (x) or ∇GˆN (x) requires computation of ∇σ(x, ωi), which also can be computed in parallel for different scenarios i. The computational bur- den for computing a objective function value or objective function subgradient thus increases linearly in the number of scenarios for each evaluation.

The stochastic factors ωi can be seen as perturbations of the original problem. Something that might be worth to investigate further is how the nonsmoothness property of FˆN is affected as N → ∞. If we minimize the expectation value, the objective function is nothing but the average of a number of non- smooth functions. If the points of discontinuities in the derivative are perturbed as ω is perturbed, and the probability distribution is smooth, one can suspect that the nonsmoothness is reduced as N → ∞. On the other hand, the set of nondifferentiable points will grow as more scenarios are included, which might make it tougher for the numerical algorithms, especially if there is some “numerical fuzzyness” close to the nondifferentiable points.

31 7.4 Solving the Network Design Problem Locally Since the implicit programming problem (16) is a nonsmooth nonlinear program in general, it is a prob- lem that regular descent methods might have difficulties solving. Through sensitivity analysis of the traffic assignment problem we have access to gradients at differentiable points. As mentioned in Sec- tion 6.4, theoretically a method will never request a gradient or subgradient at a nondifferentiable point, since they all are in a measure-zero set. On the other hand, numerically these nondifferentiable points gets smoothed out and actually might cover a substantial part of the search space, if the numerical precision of the traffic assignment solution is too low. This might cause the methods to suffer from numerical difficulties close to nondifferentiable points. Three methods for solving the network design problem locally are considered here:

• SDBH - A steepest descent-BFGS hybrid algorithm, inspired and based on the algorithm SBD which is developed and implemented in [Jos03] and further investigated and developed in [Eks08]. The box constraints are enforced by setting a maximum step length during line search.

• SNOPT - A general purpose nonlinear programming solver based on sequential quadratic pro- gramming. See [GMS05].

• LMBM-B - A bundle-method for box bounded nonsmooth optimization. See [Haa04].

SDBH and SNOPT are not adapted for nonsmooth optimization and it might be interesting to see how they compare with LMBM-B. In [GMW86, Section 4.2] it is written that if a problem is not too nonsmooth, a regular optimization method for smooth problems might give better results than one specialized for nonsmooth optimiation (e.g., Hooke-Jeeves).

7.4.1 SDBH This solver is for box bounded nonlinear programs. It is based on the the solver presented in [Jos03]. An initial point, simple bounds, objective function and gradient function must be provided. For search direction generation it has two methods: steepest descent and a BFGS update algorithm. For the linesearch, an Armijo step length rule is used and a maximum step length prevents the method to generate a point outside the convex feasible set. The termination criterion is based on the difference in objective function value between two consecutive iterations, and a smallest step length. The gradient is not used in the termination criterion, since we might be close to, or at a point where the derivative is discontinuous at optimum.

Search direction generation. The SDBH algorithm uses gradient information to find a search direction and can compute the search direction according to the rules of

• steepest descent, and

• BFGS update.

The initial point is assumed to be feasible and the box constraints are enforced by bounding the step length to not allow evaluation of points outside the feasible set. Steepest descent is known for its poor convergence properties for ill-conditioned problems, i.e., where the identity matrix is a bad approxi- mation of the Hessian ([CZ08, Section 8.3]). A BFGS update of an approximate Hessian is hence also utilized to give better convergence properties of the algorithm. However, we can not only use the BFGS update during the optimization process, since it keeps an approximation of the Hessian between the iterations, and one can suspect that that the approximation might become poor while making progress on a nonsmooth objective function. Instead, the steepest descent method is used in the beginning of the F k−F k−1 process. When little progress is made in the function value between iterations, i.e., 1+|F k| < BFGS (where F k is the objective function value of iteration k), the BFGS algorithm is started. The BFGS updated Hessian is used as long as it generates a descent search direction. If an ascent direction is generated, the algorithms switches to using steepest descent again. If the BFGS algorithm is used when the termination criterion is fulfilled, the SDBH algorithm does not exit, but changes to steepest descent in a last attempt to find a descent direction. At each switch

32 to using BFGS update, the approximate Hessian is reset to the identity matrix. Regardless of search direction generation method, the search direction is projected to the cone of feasible directions, to allow for arbitrarily small steps and prevent the linesearch to generate points outside the feasible region.

Linesearch. The linesearch is done using the Armijo step length rule. An initial guess is given, and the Armijo step length rule either accepts it or rejects it. If the initial guess is accepted by the Armijo step length rule, the step length is doubled and the Armijo rule is evaluated until the step length is not accepted. For each accepted step length, the objec- tive function is evaluated. Among the tested step lengths, the one that corresponds to the point with the lowest objective function value is finally chosen. On the other hand, if the initial guess is not accepted, the step length is halved until it is accepted or becomes too short. As soon as the step length is accepted, it is chosen as the step length to apply.

Algorithm 1 SDBH linesearch Initialization Choose an Armijo factor µ, e.g., µ = 0.01 Choose a multiplier κ, e.g., κ = 2 Choose a minimum and maximum step length αL, αU Choose a step length guess α ∈ [αL, αU ], e.g., last step length Let x be the current point and d the normalized search direction a ← 0 b ← 0 Main step repeat if F (x) − µα ≥ F (x + αd) then a ← 1 if b = 0 then if α < αU then α ← max(αU , κα) else b ← 1 end if end if else b ← 1 α α ← κ if α < αL then a ← 1 end if end if until a = b = 1 α is our step length

Algorithm 1 describes the linesearch procedure of SDBH. Before starting the linesearch, the param- eter αU in 1 is computed as the largest step length possible to not violate the constraints, which are convex since they are just simple bounds on the variables. This is a mean for preventing the linesearch to generate points outside the feasible region.

Termination criterion. As termination criterion, the following is used ([GMW86, Section 8.2.3]):

k k−1 k F (x ) − F (x ) < T (1 + |F (x )|) and k k−1 √ k kx − x k < T (1 + kx k), where k is the iteration count and T is a termination tolerance. Note that no check is made on the norm of the gradient, since the optimum might be at a nondifferentiable point.

33 7.4.2 SNOPT

SNOPT ([GMS05]) is a large-scale nonlinear programming solver which is well-suited for objective functions and gradients that are expensive to evaluate. SNOPT doesn’t claim it can handle nonsmooth problems, and it uses the gradient not only for finding a search direction, but also for checking for opti- mality (by comparing it to zero in the unconstrained case.) However, it can exit without all optimality conditions being met when little progress is made. SNOPT implements a sequential quadratic programming (SQP) method and utilizes first-order derivatives to form quasi-Newton approximations of the Lagrangian. This solver was included because it is a well-known and accessible solver, and even though it is not made for nonsmooth problems, the above mentioned argument from [GMW86]—that solvers for smooth problems are efficient on nonsmooth problems if the nonsmoothness is not to apparent—has been used to justify this choice.

7.4.3 LMBM-B

LMBM-B is a bundle method for large-scale nonsmooth box constrained problem ([KMbl]) and is an extension of the LMBM method ([HMM07]) which is similar to LMBM-B, except that it requires the problem to be unconstrained. A bundle method is a natural choice for the problem we have at hand, since it is nonsmooth and we have a method for computing subgradients (see Guideline 1 and 2). Our problem is in general not large-scale in the sense of variable count, but this algorithm was still chosen since it was freely available and easily accessible from MATLAB. A bundle method makes progress in the search space through serious steps and gathers information about a point and a subdifferential by taking null steps. Through the computation of subgradients a search direction is formed and a linesearch along it is performed. If the linesearch along the search direction gives enough descent, a serious step is taken. Otherwise, a null step is taken, which means a subgradient of a nearby point is computed and incorporated into an aggregate subgradient. In [Cla83, Theorem 2.5.1] the Clarke subdifferential of f at x is shown to be equal to the convex hull of the limit points of ∇f(xi), when xi → x and f is differentiable at xi for all i. (Compare with the definition of the Clarke subdifferential (23) in Section 6.4.) Thanks to ∂f(x) being closed, bounded and convex, ∂f(xi) = {∇f(xi)} and that any bounded sequence has a convergent subsequence, we have that ∂f(x) contains the mentioned convex hull. Showing that the Clarke subdifferential doesn’t contain any other vectors, establishing the equality, can also be done. See [Cla83, Theorem 2.5.1].) The purpose of the null-step is based on this property. In the null-steps one computes nearby gradients in order to create an approximate representation of the Clarke subdifferential, from which a descent direction then can be extracted. This choice of method is motivated by the discussion in Section 6.4.

Termination criterion. LMBM-B has several options for the termination criterion. The standard ter- mination criterion is based on the magnitude and quality of the descent direction and aggregate sub- gradient. However, it was found difficult to choose appropriate tolerances for the standard termination criterion to make it not stop prematurely or continue endlessly. Instead, when the progress in objective function value for two consecutive serious steps is less than a value (L) close to the precision of the objective function, the optimization stops.

7.5 Solving the Network Design Problem Globally Global optimization is the art of finding the best local optimum or optima among several local optima to a function. There is no universal recipe of global optimization for the general optimization problem. It has been shown ([SB98]) that in order to guarantee that a global optimum has been found (or to give a guarantee of the quality of the found solution) so called global information about the problem is needed. Global information can for instance be a Lipschitz constant, the optimal value or even the optimal point itself. However, global information is something that has to be provided by the one who owns the problem, and can not in general be assumed by a solver. If working with a specific problem

34 class, global information can be sometimes easily be obtained, e.g., for a problem a global Lipschitz constant can easily be computed. Methods for global optimization can be divided into two categories: deterministic and non-deterministic. A deterministic method can—if run long enough—find a global optimum within given accuracy. For such a method global information about the problem is needed. A non-deterministic method uses heuristics or stochastics to search the domain for global optima. These methods usually guarantee nothing, and are built on tacet assumptions about the problem that there are no pronounced ways to check. However, in practice they can perform better than local solvers on problems where several local minima exist. As mentioned in Section 7.4, the objective function value and gradient are expensive to evaluate. This restricts us in the choice of global optimization algorithm. Four global methods will be discussed here, among which three later will be used:

• Branch and bound - A determinitsic method that subdivides the feasible set into disjoint subsets and rules out subsets that can impossibly contain the global optimum.

• NFFM - A filled function method and a non-deterministic method that first finds a local minimum and then tries to escape it by minimizing a transformed, so called filled function to find another, and better local minimum. The idea is to do this iteratively until no better local minimum can be found.

• EGO - A derivative-free non-deterministic surrogate model method that first samples the ob- jective function randomly at a number of points in order to create a surrogate function for the objective function. The surrogate function is used to find potential global optima and hence new points to sample. The process is iterated. It tries to make as much use of the objective evaluations as possible and is hence commonly used for problems with costly objective function.

• DIRECT - Derivative-free non-deterministic method based on Lipschitzian optimization, but with- out a Lipschitz constant. It works by dividing the search space into rectangles, evaluating the midpoints of them. A set of rectangles are chosen to divide next, based on their potential to bear a global minimum.

7.5.1 Branch and Bound The branch and bound method is a so called deterministic approach to global optimiziation. This means that it can isolate the optimum in the search space as well as in the objective value to arbitrary precision, if it is given enough time. The method consists of four phases: relaxing, branching, bounding and pruning. The first phase is to relax the problem in order to provide lower bounds the objective value of the function over a subset of the search space. Initially the whole search space is considered as one big piece and an upper and lower bound of the global minimum is computed. In the branching phase one piece (of some clever choice) of the search space is divided into a number of new pieces (disjoint sets of the divided piece). Then new upper and lower bounds for the new disjoint sets are computed. If some pieces can be shown not to hold the global optimum (by comparing upper and lower bounds), they are discarded (or pruned). This is then repeated from the branching phase until satisfactory precision has been achieved. Note that the bounds must become tighter and tighter as the pieces are getting smaller, or it will not be possible to achieve arbitrary precision. This method makes use of global information in terms of the upper and lower bounds of the objec- tive function. These bounds can be computed by the use of interval analysis, convex under estimation or concave over estimation or by the use of a Lipschitz constant ([Han92], [Flo00]). The time complexity of the branch and bound algorithm is exponential in the number of variables and is therefore not suited for problems of high dimensionality. If the number of variables are, say 20, splitting each dimension once gives approximately 106 regions that need to be considered, and still the precision is not more than half of the original interval for each variable. This method has been discarded in this work because of the excessive amounts of function evalua- tions this method requires for problems with more than a few variables. This rules out the possibility to use the explicit problem representation with all lower level variables y explicitly included. Instead

35 one has to work with the implicit programming representation, but an objective function evaluation for the implicit programming approach means solving the traffic assignment, which is costly for larger problems. Moreover, it is difficult to find the desired global information needed to form upper and lower bounds for the implicit programming formulation. Attempts were made to find a Lipschitz constant for the implicit programming formulation, but the resulting constants from those attempts were too bad to be able to use in a branch and bound method. Hence, no branch and bound method has been used.

7.5.2 NFFM The filled function method was first presented in [Ge90] and is similar to the tunneling algoritm devel- oped in [LM85]. In [ZXZ09] a filled function for nonsmooth problems (NFFM) is presented. The filled function method is a heuristic which doesn’t give any guarantees on how good the solution obtained is. This method has been implemented by the author in MATLAB for the purpose of evaluation on the traffic network design problem. Therefore, a more detailed discussion on its inner workings is presented here. Consider the problem to minimize F (x), x (44) subject to x ∈ X, n where X = {x ∈ R | gi(x) ≤ 0, i = 1, . . . , m} and f is possibly nonsmooth, and f is coercive i.e., that f(x) → ∞ as kxk → ∞. The idea behind NFFM presented in [ZXZ09] is the following: The problem ∗ is solved in two phases. In Phase 1, we find a local minimizer x1 to the original problem (44). Phase 2 ∗ starts with constructing a so called filled function which possesses the property that a minimizer x2 to ∗ ∗ ∗ the filled function fulfills (C) f(x2) < f(x1). Additionally, x1 is a local maximizer for the filled function. ∗ ∗ A local descent method is then applied to the filled function in order to find x2. If x2 fulfills (C), Phase 1 ∗ ∗ ∗ is entered with x2 as initial point. However, if no such point x2 is found, x1 is considered as the optimal solution. The formal definition of the filled function is given next ([ZXZ09]).

∗ Definition 8. (Nonsmooth Filled Function) A function P(·, x1) is called a filled function of f(x) at a local ∗ minimizer x1 if ∗ ∗ (i) x1 is a strict local maximizer of P(·, x1) on X,

∗ n ∗ ∗ (ii) for all x ∈ X1 \ x1 or R \ X, one has 0 6= ∂P(x, x1), where X1 = {x ∈ X | f(x) ≥ f(x1)} and, ∗ ∗ ∗ (iii) if X2 = {x ∈ X | f(x) < f(x1)} is not empty, then there exists a point x2 ∈ X2 such that x2 is a ∗ local minimizer of P(·, x1).

In [ZXZ09] a filled function is proposed, under assumptions that f and gi are Lipschitz continuous and coercive on Rn (which doesn’t allow for other functions than those that are asymptotically linear or ∗ sublinear but still coercive.) The proposed filled function—let us denote it by P(·, x1, µ)—depends on a parameter µ > 0:

m ! ∗ ∗ ∗ X ∗ P(x, x1, µ) = − kx − x1k2 + µ f(x) − f(x1) − (gi(x) − gi(x1)) + i=1 (45) 1 min[0, max{f(x) − f(x∗), g (x), i = 1, . . . , m}]. µ 1 i

P is not a filled function for all values of µ, but µ has to be small enough; it is shown in [ZXZ09] that it 1 is a filled function when 0 < µ < Pm = µmax, where Lf and Lgi are global Lipschitz constants Lf + i=1 Lgi for f(x) and gi(x) respectively. The purpose of the filled function of a function and a minimizer is to transform the original function, so that the basin (basically, the points for which, if we followed a perfect steepest descent trajectory, we would end up in the minimizer) of the minimizer will be transformed to a hill (a basin for the negative

36 original function). This operation would make it possible for a local descent method to escape an already found local minimize in order find another, better one [Ge90]. We face two problems when using NFFM: • Choosing an appropriate value of the parameter µ to make P a filled function. • Making a local descent method find a local minimizer of the filled function that is promised by theory to exist. This is where the heuristic of NFFM begins. In Phase 2, µ is initially chosen as a large number, and ∗ then gradually decreased by a factor µˆ as long as no x2 fulfilling (C) has been found. When µ reaches a lower limit µL and no such point has been found yet, NFFM terminates. If µ < µmax (i.e., P is by theory proven to be a filled function), the first (distance) term in P (45) ∗ will “dominate” the second term as will be described below as long as f(x) ≥ f(x1) and x ∈ X. In the proof to Theorem 2.2 (in [ZXZ09]), establishing Definition 8.(ii), it is shown that computing the ∗ ∗ x−x1 scalar product between any of the tangents in ∂P(x, x1, µ) and ∗ gives a negative number, i.e., kx−x1 k ∗ the gradient always point towards the last best minimizer x1. This is a good property in the sense that we will never return to the old minimum, but under these conditions (µ < µmax), µ is probably very small and there is a risk that the derivative information from the original problem (∂f(x)) will be ∗ dominated by the gradient of the distance to x1, and the filled function doesn’t give much information about the original problem. It is likely that the local descent method will terminate saying the problem was unbounded. This phenomenon is indeed illustrated in a numerical experiment in Section 9.4. From a practical point of view it is not necessary that µ < µmax since µmax is the global Lipschitz ∗ constant, and a larger value of µ can be enough for leaving the basin of x1. To help a local descent solver to find points in X2 NFFM initializes the minimization of the filled ∗ function at at least 2n points. We have chosen them as x1 + δDek where the directions ek, k = 1,..., 2n are the positive and negative coordinate directions, and δD is a disturbance parameter that has to be selected. Further, it is mentioned in [ZXZ09] that the feasible set X needs to be extended for the minimization of the filled function, otherwise there is a risk that a local descent solver will stop in a minimum at the border, when it actually would have needed to take a detour outside the feasible region to return later. This puts some restrictions on the function f, since it has to be possible to evaluate it for x∈ / X. The filled function method is described in Algorithm 2. For the network design application, X corresponds to the set of possible design parameters X and the constraint functions are  x − x  g(x) = L x − xU . The objective function for the network design problem, F , is in general not scaled to order of magni- tude 1, which is usually best for numerical methods. Therefore, a scaling parameter M is introduced. Important to note though, is that it’s the order of magnitude of the variations of F as x varies that should be about 1, so there is no point in scaling with M = 105 if we have a constant offset in F of 105. Further, it is not guaranteed that F can be evaluated outside the feasible set X, and it has to be extended in order to fulfill the requirement from the filled function method that f can be evaluated on the whole Rn. This is done by clamping the design paramters x to the lower or upper bound for those components not within the box bounds. An objective function that can be evaluated at all points is 1 F (x) = F (min (x , max (x , x))), clamped M com U com L

where mincom and maxcom are component-wise minimum and maximum operators respectively. A requirement for the filled function method to be applicable is that f is coercive on Rn. This is not automatically satisfied by Fclamped in our application, since we use clamping to extend the region where F is not possible to evaluate, which makes Fclamped bounded from above on the whole space. What we then do is forming f by adding penalty terms to Fclamped(x): T f(x) = Fclamped(x) + [g(x)]+[g(x)]+,

where [s]+ is a vector whose element i is max{0, si}. Note that the Lipschitz continuity assumption is broken here. We have made some infringements of the necessary assumptions for the nice properties of the filled functions to be able to use the method in practice.

37 Algorithm 2 Filled function method (NFFM) Initialization Chose a disturbance constant δD, e.g., δD ← 1 −4 Choose a lower bound of µ , e.g., µL ← 10 Choose a reduction factor µˆ, e.g., µˆ ← 0.5 Choose directions ek, k = 1, . . . , k0 ≥ 2n ∗ Choose an initial point x2 Main loop ∗ ∗ (A) Start from x2 and minimize f(x), x ∈ X to obtain minimizer x1 µ ← 1 while µ ≥ µL do k ← 1 while k ≤ k0 do ∗ ∗ n ∗ Start from x1 + δDek and minimize P(x, x1, µ), x ∈ R to obtain minimizer x2 ∗ if x2 ∈ X then ∗ ∗ if f(x2) < f(x1) then Go to (A) else Go to (B) end if end if k ← k + 1 end while (B) µ ← µµˆ end while ∗ return x1

7.5.3 EGO EGO stands for Efficient Global Optimization (see [JSW98]). It’s a derivative-free method for costly black-box functions, and it works by sampling the search space rather than following a search trajectory through line search. Initially a random set of points from the search space is generated. The number of points is based on the number of dimensions in order to make the points “cover” the search space. From these points, a surrogate model of the objective function (called the true function) is generated by interpolating the ini- tial points and assuming values for points close to each other are more correlated than values for points far away from each other. This surrogate model predicts what function values one would get when sampling at different points in the search space. Based on the distance between the samples, a measure of uncertainty in the surrogate model can be computed. Combining the surrogate model and the uncer- tainty model gives a measure of expected improvement for each point in the search space. “Expected improvement at a point” here means the improvement in objective function value (versus current best objective function value) one can get by sampling at that point. Now this expected improvement func- tion is maximized, and the point found will be the next sample to take from the objective function. Then this proces is iterated, starting again with creating a new surrogate model. Figure 7 illustrates the principles of EGO on a one dimensional problem. Each iteration in this process is costly, even for objective functions that are cheap to evaluate. The time spent on generating the expected improvement function is motivated by the fact that all previous points sampled are contributing with information for it, and this is a way of making as much use of the previously sampled points as possible, without having any a priori information about the structure of the underlying objective function. Since each iteration is costly, this kind of method can only be motivated to be used on problems where each sample is costly as well. EGO suffers from the curse of dimensionality, since adding one dimension requires much more sampling to cover the whole search space.

38 2.5 2.5 Surrogate function Surrogate function True function True function 2 2 Sampled points Sampled points Expected Improvement Expected Improvement 1.5 1.5

1 1

0.5 0.5

0 0

−0.5 −0.5

−1 −1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

(a) First iteration (b) Second iteration

2.5 3 Surrogate function Surrogate function True function 2.5 True function 2 Sampled points Sampled points Expected Improvement Expected Improvement 2 1.5

1.5 1 1 0.5 0.5

0 0

−0.5 −0.5

−1 −1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

(c) Third iteration (d) Fourth iteration

Figure 7: An illustration of how the EGO method works. Before the first iteration in (a), a number of points has been sampled and a surrogate function has been computed. Based on the surrogate function and the uncertainty in it, an expected improvement function is generated. The uncertainty is computed based on the assumption that points close to each other are more correlated than points far from each other. The expected improvement function is then maximized and the maximizer is the new point to sample. In (b), (c) and (d) we can follow how the EGO method would proceed on our one-dimensional example problem.

39 7.5.4 DIRECT

DIRECT was first presented in [JPS93], an article named “Lipschitzian Optimization without the Lips- chitz Constant”. DIRECT solves box bounded Lipschitz continuous optimization problems and requires no derivates, but relies only on sampling. The search space is divided into a set of disjoint hyperractangles. Initially we have only one rectan- gle, with one midpoint. The objective function at the midpoint is evaluated. First in an iteration a subset of all rectangles are selected to be subdivided. The procedure for deter- mining the subset is based on Lipschitzian global optimization where one usually has a predetermined Lipschitz constant. In that case, we can compute a best possible objective function value that can be achievied within each rectangle by assuming a negative slope equal to the Lipschitz constant in the di- rection from the midpoint. At least one rectangle has a lowest value of this potential objective function value, and one of them can be chosen to be further subdivided. In our case the true Lipschitz constant is unknown, and instead all rectangles that have potential (in the sense mentioned above) to bear the lowest objective function for at least some Lipschitz constant (from zero to infinity) are selected to be subdivided. The set of these rectangles can be determined in quite an efficient way. Interesting to note is that the rectangle with the smallest objective function value at the midpoint will be selected, since it is selected for Lipschitz constant equal to zero. Also, the largest rectangle is chosen, since it is selected for Lipschitz constant equal to infinity. Then, the rectangles selected are subdivided into three new disjoint hyperractangles and the objec- tive function value at the midpoints of the two new rectangles is computed. If a rectangle becomes too small, it is ignored from further subdivision. This procedure is iterated, starting again from selecting potential rectangles. Details on how the subset of rectangles to be subdivided is determined in practice and how a rect- angle is subdivided can be found in [JPS93].

7.6 Implementation and Usage Details

This section concerns the most important implementation and usage details of the algorithms used.

7.6.1 DSDTAP

The DSD solver for the traffic assigment problem will be called DSDTAP and is a conversion from MATLAB to C++ of the implementation developed for [Jos03], with some modifications. Two major modifications are the support for elastic demand and support for centroids (nodes which may constitute origin or destinations, but not let any flow pass through them). Further, support for BPR cost functions, TU71 cost functions and the MNL elstic demand functions was implemented. As in [Jos03], two search direction generation techniques were implemented: projected gradient, and constrained Newton (with diagonal approximation of the Hessian). See [LP92] for a description of the constrained Newton method. In contrast to what was expected, the constrained Newton method per- formed worse than the projected gradient method on larger networks such as Barcelona and Winnipeg, but it performed much better on problems with elastic demand. For larger problems, the algorithm has general problems converging. Elastic demand also made the algorithm converge much slower, proba- bly because of the difference in scale of derivatives between the excess demand link cost function and the regular cost functions. Much effort was spent on increasing the performance of the solver, mostly through high perfor- mance programming techniques, such as parallelization. Algorithmical changes were also considered, but no significant changes from the version in [Jos03] were implemented. The solution from the DSD solver consists of routes and route flows. The routes are generated using Dijkstra’s algorithm on the network, and it is checked that no already existing routes are added again. However, there is no mech- anism that prohibits the addition of a route making the set of routes for an OD pair linearly dependent with respect to their arc-chains. In contrast to the implentation in [Jos03], the time spent for column generation step is negligible compared to the time spent solving the RMP. The Boost Graph Library implementation ([SLL02]) of Dijkstra’s algorithm was used. In the column generation step we only want to find cycle free routes (see

40 Assumption 1.(i)). Using a shortest path algorithm for finding cycle free paths require each link to have a positive cost. Profiling of the DSD solver showed that the mapping from route flow space to link flow space (which has to be done between search direction generation and a linesearch) is a very expensive operation. Much time is also spent on computing the exponentials and logarithms in the cost and demand functions respectively. For many problems, the exponents are integers. Taking advantage of that speeds up the cost function evaluations a lot.

Elastic demand. The elasticity in demand was implemented by the use of the aforementioned excess max demand method. The computation of dk was made according to the discussion in Section 5.2.

Centroids. Some networks have nodes that only work as origins and destinations, where no flow through the node is permitted. Links connecting to these nodes are often OD connectors which have no congestion effect, i.e., a constant cost function, and if flow through the nodes was permitted, we could end up having routes with constant total cost function. Hence, these nodes require special caution when it comes to generating new routes in the network. See the left network in Figure 8. The dashed lines are OD connectors with constant cost functions and the circles are nodes through which no flow may pass. If link a is too costly, there is a risk a regular shortest path algorithm would return the route using only OD connectors from the origin to destination 2 through destination 1. This would be an illegal choice of a route. To avoid this to happen, the solution used in this implementation of the DSD solver is to, when solving the for the origin, remove in-going links to the origin node and out-going links from all destinations. See the right network in Figure 8. This is done by setting the cost of those links to infinity. Note that the shortest path algorithm finds the shortest path for one origin and all destinations at a time. In terms of our graphs G and Gk for all k ∈ C, the shortest path problem is, for each origin l ∈ O, S solved for graph Gl = (N , k∈{k | ∃j:k=(i,j)∈C and i=l} Ak), i.e., the “union” of graphs Gk with origin l.

Destination 1 Destination 2 Destination 1 Destination 2 Origin Origin

Figure 8: In-going links to the origin and out-going links from the destinations are removed. The dashed lines are OD connectors, while the solid lines are not.

7.6.2 SDBH

SDBH is implemented in MATLAB, based on the implementation of the descent algorithm described in Section 7.4.1, and described in detail in [Jos03]. A number of settings can be altered when using SDBH, and they are listed in Table 1.

Setting Default Description Armijo Factor 0.01 Named τ in Algorithm 1. Armijo Multiplier 2 Named κ in Algorithm 1. Initial Steplength Factor 8 The initial steplength is set to the maximum possi- ble steplength divided by this number. −8 Termination Tolerance 10 Named T in Section 7.4.1. 4 BFGS Start Tolerance 10 BFGS from Section 7.4.1 is set to T multiplied by this number.

Table 1: Settings for SDBH.

41 7.6.3 SNOPT The TOMLAB /SNOPT interface was used in order to be able to call SNOPT ([GMS05]) from MATLAB. SNOPT has a large number of options (settings) that can be altered. We’ll use the SNOPT default setting for most of them, but some of them are changed for our purposes. Table 2 lists some settings for which the SNOPT default has been overridden with a default value applying to this work. Note that Function Precision is not changed. Its default value is far lower than the actual precision in our functions. However, it was noted that it was very difficult to set a good value on function preci- sion to get an optimality tolerance that didn’t cause the process to stop early or to continue endlessly. Therefore, the default optimality tolerance is also very low compared to our precision, and we can’t expect the regular optimality conditions to be met, partly due to the bad precision of our functions, but also because of the nonsmoothness of the problem.

Setting Default Description Linesearch Tolerance 0.99 Accuracy of linesearch. Higher means less accurate. The SNOPT manual suggested to increase this value for costly function evaluations. Nonderivative Linesearch Yes No derivatives are used during linesearch. This is motivated by the fact that the precision of our gra- dients might be very bad. Hessian Frequency 10 The Hessian approximation is reset to the iden- tity matrix after 10 successive BFGS updates. The reason is that, since the objective function is non- smooth, a Hessian approximation might become worse after stepping over nondifferentiable points.

Table 2: Settings for SNOPT.

7.6.4 LMBM-B An implementation of LMBM-B has been written by the inventor of the same, namely Napsu Karmitsa, and it was kindly shared to be used in this work. The implementation is written in FORTRAN and is described in [Kar07] and the FORTRAN routines are called from MATLAB using a MEX-interface written by Seppo Pulkkinen. The MEX-interface was originally written for the unconstrained version LMBM, and a few modifications were necessary to make it run for LMBM-B. LMBM-B has a number of settings that can be altered. We’ll use the LMBM-B default setting for most of them, but some of them are changed for our purposes. Table 3 lists some settings for which the LMBM-B default has been overriden with a default value applying to this work.

7.6.5 NFFM An implementation of NFFM was written in MATLAB. The implementation follows the method de- scription in Section 7.5.2 with two exceptions:

• Instead of letting the minimization of the filled function be conducted on the the whole of Rn, the original boundaries of the variables were expanded. More specifically, the interval was symmet- rically elongated for each variable so that its length was tripled. The reason behind this is that the LMBM-B method (the subsolver used) had difficulties stopping for unbounded problems. The original reason for conducting the minimization on the whole of Rn is to not make the minimiza- tion get stuck at the boundaries. This is probably still avoided in practice by the expanded search space.

• In Algorithm 2, there is a line saying Go To (B). This one was removed in the implementation. The reason it is there is that the current value of µ is not good (i.e., it is larger than than the inverse of the smallest possible Lipschitz constant) if a local minimum of the filled function at a point in the feasible that has worse objective function value of the original function than the current point

42 Setting Default Description

First tolerance for change Special This is the absolute tolerance named L in Section in function values 7.4.3. This value is always set to rel.error(F ) found by combining data from Table 8 and 10 for each problem. Second tolerance for 104 See [Kar07]. change in function values Tolerance for first termi- 10−14 This one is set so low that the default termination nation parameter criterion isn’t effective. (It is motivated by the dis- cussion on termination criterion in Section 7.4.3.) Maximum step size 8 Maximum step size in linesearch. The manual [Kar07] says this is a parameter that should be care- fully tuned. The value 8 has been used with success for several problems. Maximum small progress 2 Known as IPAR(4) in the manual [Kar07]. Set to iterations two motivated by the discussion on termination cri- teria in Section 7.4.3.

Table 3: Settings for LMBM-B.

was found. The original idea is then that µ should be decreased. (In the theoretical development of this method, it is assumed that µ is smaller than a value µmax, which depends on the Lipschitz constants of the involved functions). However, in practice, it is worth searching all directions (i.e., letting k adopt its possible values) before decreasing µ, because when µ is too small, the original objective function will be totally dominated by the distance term. See Figure 11(e) for an example. The default values of the settings vary a bit from the proposals in the original paper [ZXZ09]. Since our variables often are of order of magnitude 10, the Disturbance Constant (δD) is set to 1 instead of 0.1. −4 Further, Lower Mu (µL) is set to 10 to reduce exhaustive search, and the Mu Reduction Factor (µˆ) has default value 0.5 instead of 0.1 (see Section 9.4 for a motivation of this choice). All settings and their respective default values and meanings are listed in Table 4.

Setting Default Description

Disturbance Constant 1 δD in Algorithm 2. −3 Lower Mu 10 µL in Algorithm 2. Mu Reduction Factor 0.5 µˆ in Algorithm 2. Objective Scaling Parameter 1 M from Section 7.5.2.

Table 4: Settings for NFFM.

7.6.6 EGO For EGO, the TOMLAB implementation of EGO was used. The defaults were used for all settings except the maximum number of function evaluations. See Table 5 for new default values. Setting Default Description Maximum Function Evaluations 200 Maximum number of function evaluations before algorithm terminates.

Table 5: Settings for EGO.

7.6.7 DIRECT The TOMLAB implementation glbDirect of DIRECT was used. The default settings were always used. In the coming experiments, only the Maximum Function Evaluations setting was changed be-

43 tween different runs.

8 Approach II: Cutting Constraint Algorithm

The second approach tried is the Cutting Constraint Algorithm (CCA) described in [LH04], which is a method that directly utilizes the variational inequality formulation from Section 6. The traffic assign- ment solution can be expressed as the solution to the the variational inequality problem to find a vector y = (v, d) ∈ V such that C(x, y)T (ˆy − y) ≥ 0, ∀yˆ ∈ V, (46) where C(x, y) = (t(x, v)T , −D−1(x, d)T )T . The variational inequality involves a universal quantifier (∀) which makes it impossible to immediately incorporate it as a constraint in a general nonlinear program, since a solver can’t possibly explicitly check if the inequality is fulfilled for all yˆ ∈ V . However, if V is a polytope, the variational inequality can be rewritten as a finite number of inequalities. Let yˆi, i = 1, . . . , ne be the extreme points of V . By the Representation Theorem, an arbitrary point yˆ ∈ V can be Pne i Pne written as yˆ = i λiyˆ , i λi = 1, λi ≥ 0, and the left hand side of the variational inequality for this point is ! T T X i X T i C(x, y) (ˆy − y) = C(x, y) λiyˆ − y = λiC(x, y) (ˆy − y). (47) i i Consider the following set of inequalities: T i C(x, y) (ˆy − y) ≥ 0, i = 1, . . . , ne. (48) Using (47), we can say that if y satisfies (48), then the VI (46) is also satisfied. The opposite is also true; if y satisfies the VI (46), it also satisfies (48), since yˆi are elements of V . Hence, (46) and (48) determine the same solution set for y. For the elastic demand case, the set V of feasible link flows and demands based on cycle free routes is not a polytope since the demands can be chosen arbitrarily large. However, the solution to a traffic assignment problem is always bounded, and one can let large enough artificial upper bounds of the demands be included in V to make it a polytope. (In practice, a representation of V will be used that includes also cycles in the routes. However, if the cost functions t are positive, then only flow on cycle free routes will be generated, since any route with a cycle is more expensive than it without the cycle.) ∗ If our optimal solution to (17) is at y , we don’t need all ne inequalities to be included in a program in practice to verify by a numerical method that it is an optimal solution. Only the active constraints in (48) are necessary, in addition to being sure that none of the excluded constraints are violated. The solution method proposed in [LH04] generates extreme points of V as they are needed in an iterative process. The feasible set can be expressed through any of the representations mentioned in Section 4. One that is suitable for medium-scale problems is the origin aggregate link-node representation, since it requires less variables than the link-node representation and no column generation as the link-route representation does. However, column generation for this method could be an interesting combination to consider, since the link-route representation probably requires less variables for large-scale problems. We define a master problem (49), which is the network design problem (17) with the variational in- equality replaced by a subset of cardinality nc of the inequalities in (48): minimize F (x, y), (x,y) x ∈ X, subject to (49) y ∈ V, T i C(x, y) (ˆy − y) ≥ 0, ∀i = 1, . . . , nc. Suppose problem (49) is solved with solution (x∗, y∗). Then we can test whether or not y∗ fulfills the variational inequality (46) by solving the linear programming problem minimize C(x, y∗)T y, y (50) subject to y ∈ V.

44 ∗ Denote the solution to (50) by yL. Now, one can see that if

∗ T ∗ ∗ C(x, y ) (yL − y ) ≥ 0, (51)

the variational inequality (46) holds at (x∗, y∗) and if the solution additionally is a minimizer of (49), ∗ it is a minimizer of the full problem. On the other hand, if (51) fails to hold, yL can be added as an nc+1 ∗ additional constraint in the master problem with yˆ = yL and the new inequality will cut away the infeasible point y∗ from the feasible set of the master problem. This is the idea behind CCA. Algorithm 3 describes the CCA algorithm.

Algorithm 3 Cutting Constraint Algorithm (CCA) 1 T Let yˆ = argmin{C(x0, 0) y | y ∈ V } nc ← 0 repeat nc ← nc + 1 Solve (49) and obtain solution (ync , xnc ) Solve (50) with y∗ = ync and obtain solution yˆnc+1 until C(xnc , ync )T (ˆync+1 − ync ) ≥ 0 (ync , xnc ) is our solution

The CCA algorithm terminates after a finite number of steps, since the number of extreme points of a polytope is finite. The difficulties of CCA lies in solving the master problem (49) which grows for each iteration when adding new constraints. In general, the Mangasarian-Fromovitz constraint qualification (MFCQ) doesn’t hold at all points for the master problem. The error in the traffic assignment solution is given by the feasibility error of the nonlinear con- straints upon termination. The reason is that the LP solver always generates constraints that are vio- lated at the current iteration if they exist. If not, then we terminate, and then the sum of the absolute errors of the included constraints sum up to the duality gap in Section 4.4. The algorithm was implemented in MATLAB using TOMLAB /SNOPT for the master problem and the TOMLAB /CPLEX for the LP problem. In each iteration, after the new constraint has been added, SNOPT is started with warm start information provided. The warm start information consists of slack variables and Lagrange multiplier states from the previous run of SNOPT. The algorithm gained much in performence (at most 50 times speedup per iteration was noted) by starting SNOPT with warm start information instead of only an initial point. A relative tolerance of 10−7 for termination criterion was added, so that the algorithm was termi- nated when C(xnc , ync )T (ˆync+1 − ync ) ≥ −10−7(ˆync+1 − ync ). Also, in every iteration, it is possible to take the current design parameters x, get the traffic assign- ment solution using a traffic assignment solver and compute the value of the objective function for the current design parameters. This value can then be compared with the objective function value of the optimal solution of (49) to give an estimate of the quality of the solution. For solving the SMPEC, we used the discretization schemes (SMPEC)N and (SRPEC)N . All N sce- narios were included in one NLP, each scenario with its own flow solution vector and flow constraints. The linear constraints are uncoupled with respect to the scenarios. However, the N variational inequali- ties are coupled in the design variables. In each iteration in Algorithm 3, one constraint for each scenario is added to the set of constraints. One idea for increasing the performance for the stochastic problem was based on the thought that perhaps many of the scenarios share extreme points and that these could be generated by solving the network design problem for only one scenario. This could then be solved relatively fast, and the solu- tion (in terms of not only variables x, but also slack variables and Lagrange multpliers) be supplied as warm start information for a run with all scenarios included. This was tested, but it was noted that only very few of the generated extreme points from the one-scenario problem were used for that particular scenario in the larger run (with all scenarios), and none of the generated extreme points from the one- scenario problem were used by the other scenarios in the larger run. Hence, it only generated a larger problem which took longer to solve overall.

45 9 Numerical Experiments

The numerical experiments conducted here are the main contributions of this work, and their purpose is to answer our questions on whether local methods are sufficient or global methods can find significantly better solutions to network design problems, how the methods scale with network size, and to what extent it is possible to solve stochastic network design problems. Four networks of varying sizes have been selected for the numerical experiments: Harker and Friesz, Sioux Falls, Stockholm and Anaheim. The Harker and Friesz network consists of sixteen links, and Ana- heim of 914 links, and the sizes of Sioux Falls and Stockholm are inbetween. We have two versions of the Sioux Falls network: one with fixed demand and one with elastic demand. The Stockhom network is modeled with elastic demand, and the remaining networks with fixed demand. Section 9.1 lists the networks and their properties. All solvers from Approach I and II will be used in the experiments. Here follows a list over the Approach I solvers:

• SDBH (local). Description in Section 7.4.1. Implementation in Section 7.6.2.

• SNOPT (local). Description in Section 7.4.2. Implementation in Section 7.6.3.

• LMBM-B (local). Description in Section 7.4.3. Implementation in Section 7.6.4.

• NFFM (global). Description in Section 7.5.2. Implementation in Section 7.6.5.

• EGO (global). Description in Section 7.5.3. Implementation in Section 7.6.6.

• DIRECT (global). Description in Section 7.5.4. Implementation in Section 7.6.7.

The Approach II solver is: CCA. Description and implementation in Section 8. Sections 9.2 – 9.11 present the numerical experiments: how they were performed, their results and also a short discussion. The experiments can be categorized as follows:

• Meta-experiments: Investigate the methods for the purpose of preparing them for the main ex- periments.

– Section 9.2: Precision of Objective Function. Aims to finding the precision of our Approach I objective functions in order to be able to set suitable termination criteria for each problem. – Section 9.3: Evaluation of Rules for Defining Used Routes. Compares the practical perfor- mance differences from using the rules mentioned in Section 7.2.

• Illustrative examples: This category includes two illustrative examples.

– Section 9.4: NFFM on Six-Hump Camel Function. An example of how NFFM solves a two- dimensional global optimization problem illustrated with figures to give a picture of the workings of the NFFM method. – Section 9.5: DSDTAP on a Trivial Elastic Demand Problem. An example of how a elasticity in a trivial network is modeled using the MNL elastic demand functions.

• Main experiments: This is the largest category of experiments, and these experiments aim to give answers to our questions.

– Section 9.6: Time Complexity in Number of Scenarios of LMBM-B and CCA for Stochastic Ex- tension. Compares the efficiency of Approach I and Approach II on the discretized stochastic extension problem. This is done by investigating the time complexity in number of scenarios for the Approach I solver LMBM-B and the Approach II solver CCA. – Section 9.7: Sioux Falls with Elastic Demand First-Best Toll Pricing Problem (SFE FB TP). The exact solution for the first-best toll pricing problem can easily be computed using marginal- cost toll pricing. When the solution is known it is easier to compare the convergence of methods. The three Approach I solvers SDBH, SNOPT and LMBM-B are compared.

46 The following four experiments have similar experimental setup. The sections are separated by problem rather than experiment, and for each section several experiments are performed. For each problem local and global optimization based on Approach I is performed. The purpose is to compare the local and global methods in order to see whether the global methods can find significantly better solutions than the local solvers. For each problem, except the Stockholm problem, the CCA method from Approach II is used and compared with the Approach I solvers. For two problems: Harker and Friesz and Stockholm, stochastics are added and a stochastic ex- tension of the problems is solved using Approach I solvers.

– Section 9.8: Harker and Friesz Capacity Expansion Problem (HF CEP). This experiment ex- poses some weaknesses of the NFFM and EGO methods, which are further investigated in this section. – Section 9.9: Sioux Falls with Fixed Demands Capacity Expansion Problem (SFF CEP). – Section 9.10: Stockholm Toll Pricing Problem with Cordon J2 (STHLM J2 TP). – Section 9.11: Anaheim Capacity Expansion Problem (ANA CEP).

The last experiment shows on the limits of the proposed solution methods.

– Section 9.12: Barcelona.

9.1 Problem Set To make it easy to reference a particular problem setup, a name coding system is here introduced. Each name is on the format NETWORK [GROUP] [S](CEP, TP) [EXP, CVAR] where the symbols in the format correspond to the following information: • NETWORK: Name of network, e.g., HF. • GROUP: Name of toll gate group set or expansion group set, e.g., J1. Empty if there is only one group set for the network. • S: S if stochastic, otherwise empty. • CEP, TP: CEP if problem is a capacity expansion problem or TP if problem is a toll pricing prob- lem. • EXP, CVAR: If stochastic, EXP means objective function is expectation value, and CVAR means objective function is C-VaR.

9.1.1 Harker and Friesz CEP (HF CEP) This small-scale network consists of 16 links, 6 nodes and 2 OD pairs and originates from [HF84] and is known as the Harker and Friesz network (HF). It was chosen to be included since it has been apparent from earlier experiments ([Jos03]) that a CEP on it has several local minima. The BPR functions are used as cost function model. No OD connectors are used in this network. In this CEP, all links are subject to possible expansion and the investment costs for expansion of the links are the following linear functions:

φa(ρa) = caρa, a ∈ A, where ca in this case are the constants given in Table 6. Note that we can index the expansion groups by a instead of i since W is the identity matrix here, and there is a one-to-one relation between expansion groups and links. The upper bound for all expansion parameters is 20 while the lower bound is 0. If nothing else is indicated, the starting point is all parameters equal to zero.

47 a ca a ca a ca a ca 1 2.0 5 9.0 9 2.0 13 5.0 2 3.0 6 1.0 10 5.0 14 3.0 3 5.0 7 4.0 11 6.0 15 6.0 4 4.0 8 3.0 12 8.0 16 1.0

Table 6: Constants for investment costs in HF network.

HF 2 CEP. We’ll also use another version of this problem, where only links 2, 3, 6, 8 and 14 are subject to expansion and the expansion of link 16 is fixed to 20. This modification of HF CEP will be called HF 2 CEP.

HF SCEP EXP/CVAR. A third version of this problem is HF SCEP EXP, which is a stochastic extension of the deterministic problem, where the capacities of links 2, 3 and 16 are stochastic variables. The BPR cost functions of those links are subject to stochastics according to (37) and (38) in Section 6.9 where 1 1 ω ∈ (1 + U(− 5 , 5 )), and U(a, b) is the uniform probability distribution for the interval [a, b]. For HF SCEP CVAR, the parameter β is equal to 0.8 or 0.9. Note that there is only one random variable that affects the three links!

9.1.2 Sioux Falls Fixed Demand CEP (SFF CEP) Sioux Falls is a classic network of 76 links, 24 nodes and 528 OD pairs often used for evaluating methods for traffic assignment and network design. It occurs in several different versions, and here we use the version published in [LMP75], modified for the rush hour by dividing all capacities by 1000, and dividing the cost function by 100. The demand is divided by 1000, and then multiplied by 1.1 (i.e., in total multiplied by 0.0011). This is the setup used in [Jos03]. The BPR functions are used as cost function model. The capacity can be expanded on ten links in the network listed in Table 7. The investment functions are defined as follows: 2 φi(ρi) = ciρi , where ci in this case are the constants given in Table 7. Here we have only one link a per expansion group i.

i a ca i a ca 1 16 0.026 6 26 0.025 2 17 0.04 7 29 0.048 3 19 0.026 8 39 0.034 4 20 0.04 9 48 0.048 5 25 0.025 10 74 0.034

Table 7: Constants for investment costs in SFF network.

The upper bound for all expansion parameters is 25 while the lower bound is 0. If nothing else is indicated, the starting point is all parameters equal to zero. No OD connectors are used in this network.

SFF SCEP EXP. SFF SCEP EXP is a stochastic version of SFF CEP, where where the parameter ba,1 in 1 1 the BPR cost functions are multiplied by a random number (1 + U(− 4 , 4 )) in each scenario and the stochastic objective function is the expectation value of the deterministic objective function.

9.1.3 Sioux Falls Elastic Demand TP (SFE TP) In [Eks08] a Sioux Falls network modified for elastic demands is presented. The arcs and nodes are the same as in the original paper [LMP75], but the cost functions and demands are different. The cost

48 functions are of the BPR kind, and the elastic demand is a MNL modal split model. A full description of the network, with parameters for the BPR cost functions and MNL inverse demand functions can be found in [Eks08]. We let the dispersion parameter α be equal to 0.025, which is the same as used in the numerical experiments in [Eks08], despite that the thesis says 0.05. Ekström made this correction in personal com- munication with the author of this work. For this network, only first-best toll pricing will be considered, i.e., that all links are tollable. The problem will be denoted SFE FB TP. No OD connectors are used in this network.

9.1.4 Small Stockholm Elastic Demand TP (STHLM TP) In some of the numerical experiments a Stockholm network kindly provided by Clas Rydergren2 and Joakim Ekström3 has been used. This network was chosen in an attempt to focus on realistic instead of small-scale problems, and is of medium-scale. It has 392 links, 149 nodes and 1547 OD pairs. The Stockholm network is a two-modal model where the first mode is car traffic and the second is public transport. The public transport mode is modeled through elastic demand, which means all flows on the links in the network is from the car traffic mode. The cost functions are the TU71 functions (see Section 5.1 for a description of the TU71 functions.) . The two-mode feature is modeled by the use of MNL, and the dispersion parameter α is set to 0.07. A congestion toll charging system is already in use in the city of Stockholm, and in our experiments we will use a toll charging cordon that is similar to the one in use in Stockholm today. We will consider two cases here: J1 - we have one toll gate group only, J2 - we have two toll gate groups; one for the Northern links and one for the Southern links. The two problems will be denoted by STHLM J1 TP and STHLM J2 TP, respectively. The upper bound for all toll prices is 30 and the lower bound is 0. The default initial point is all toll prices equal to 15. The default initial point is chosen not as zero, because the traffic assignment solver takes a long time converging there. OD connectors are used in this network, but they do not form cycles; hence, Theorems 6 and 8 can be applied for establishing single-valuedness and strong regularity of the traffic assignment solution mapping. Figure 9 shows a map over the network.

STHLM J2 STP EXP/CVAR. The capacity (ca,3) on the two links (both ways) constituting Essingeleden (marked by an asterisk in the map in Figure 9) is in this problem multiplied by a random variable 1 1 (1 + U(− 5 , 5 )). For the conditional value-at-risk problems, values 0.8, 0.9 and 0.95 of β are considered.

2Assistant Professor, Department of Science and Technology, Linköping University, Norrköping, Sweden. 3Ph.D. student, Department of Science and Technology, Linköping University, Norrköping, Sweden.

49 N

*

Figure 9: A map over the Stockholm network with an embedded zoom-in of the central city. Toll gates are marked by thicker lines crossing the links they are on. The J2 cordon corresponds to grouping the seven Northern toll gates to one group, and the three Southern toll gates to another. The link marked by a big asterisk in the zoomed-in map is “Essingeleden” which is the only toll-free route linking the Southern part of Stockholm with the Northern part in this model. The single links connected only in one end are OD connectors.

9.1.5 Anaheim Fixed Demand CEP (ANA CEP) Anaheim is another medium-scale network with 914 links, 416 nodes and 1406 OD pairs, which uses BPR functions for travel cost modeling. The source of the network data is [BG10]. The data has been slightly modified though: the demands and capacities have been scaled by a factor 0.01 in order to make the scales as close to 1 as possible. Note that this rescaling only affects the numerics; the traffic assignment solution is unaffected. For this network, we will solve the capacity expansion problem, and hence need to find a set of candidate links since no source has been found where this has already been done. Investment costs also need to be determined. We use the following approach: We want each expansion group to contain links at approximately the same geographical area, so that we can do capacity enhancements at spots in the network that constitute bottle-necks. Therefore, we don’t choose the link–expansion group relations randomly, since that would cause links belonging to the same expansion group being geographically scattered. Instead the links in each extention group should form a connected subgraph. To acheive this we do the following: We will have M expansion groups, each containing N links. The expansion groups will be assigned in the following way:

 ∂ta(va)  1. Sort all the links after descending marginal costs va at user equilibrium for the non- ∂va expanded network, and put them in a list (we will call it the list) in that order.

2. Consider one of the M initially empty expansion groups not considered yet.

3. Put the first link from the list to the currently considered expansion group and remove it from the

50 list. 4. Find the first link from the list which shares at least one node with any of the links in the currently considered expansion group. Add that link to the currently considered expansion group and remove it from the list. Repeat this step N times. 5. Repeat from 2. M times. This scheme will not be possible to follow for all networks and values of M and N. However, if we choose M = 10 and N = 30, the scheme works and the expansion groups can be assigned according to it. For the investment costs, we use a quadratic function:

2 φi(ρi) = ciρi .

The expansion group definitions and the values of the constants ci can be found in Table 29 and 30 respectively in Appendix A.1. The upper bound for all expansion parameters is 20 while the lower bound is 0. As starting point, all expansion parameters are set to zero if nothing else is indicated. OD connectors are used in this network, but they do not form loops, hence Theorem 6 can be applied for establishing uniqueness of the traffic assignment solution.

9.1.6 Barcelona Fixed Demand (BARC) Barcelona is the largest network considered with 2522 links, 1020 nodes and 7922 OD pairs, and uses BPR functions for travel cost modeling. The source of the network data is [BG10]. This network is only considered in the experiment in Section 9.12.

9.1.7 Summary of Problems In Table 8, a list of all considered problems are listed with associated properties and parameters. These are used as defaults in all numerical experiments if nothing else is indicated.

Problem # links # nodes # OD pairs # parameters xU x0 DSD HF CEP 16 6 2 16 20 0 10−10 HF 2 CEP 16 6 2 5 20 0 10−10 SFF CEP 76 24 528 10 25 0 10−10 SFE FB TP 76 24 528 76 25 0 10−10 STHLM J2 TP 392 149 1547 2 30 15 10−7 ANA CEP 914 416 1406 10 20 0 10−9 BARC 2522 1020 7922 ––––

Table 8: A summary of problem parameters for each considered problem.

9.2 Precision of Objective Function Here we want to investigate how the precision in the network design problem objective function de- pends on the precision of the underlying DSD solver, i.e., the relative gap in the traffic assignment solution. A number of problems were considered, and for each problem, the objective function at 10 randomly generated points was computed for a set of different tolerances DSD of the relative gap. For each point, the objective function value obtained for the smallest tolerance was considered the exact value, and the relative objective function value error was plotted versus the reached relative gaps. The problems included in the experiment are listed in Table 9 together with the smallest tolerances that were used and type of objective function (see Section 6 for objective function definitions). The reason a larger tolerance of the relative gap was used for some of the problems, is that the DSD solver has problems converging to higher accuracy for those problems. The results are presented in Figure 10.

51 100 100

10−2 10−2

10−4 10−4

10−6 10−6

10−8 10−8

10−10 10−10

10−12 10−12 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2

(a) HF CEP (b) SFF CEP

100 100

10−2 10−2

10−4 10−4

10−6 10−6

10−8 10−8

10−10 10−10

10−12 10−12 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2

(c) SFE FB TP (d) STHLM J1 TP

100 100

10−2 10−2

10−4 10−4

10−6 10−6

10−8 10−8

10−10 10−10

10−12 10−12 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2

(e) STHLM J2 TP (f) ANA CEP

Figure 10: Error in relative social surplus plotted versus relative gap of the traffic assignment solution. Each point corresponds to a run at a random point in the search space and a tolerance level of the relative gap. The lines connect the runs done at the same random point.

52 Problem Smallest DSD used Objective function HF CEP 10−10 UC SFF CEP 10−10 UC SFE FB TP 10−10 −∆SS STHLM J1 TP 10−8 −∆SS STHLM J2 TP 10−8 −∆SS ANA CEP 10−9 UC

Table 9: Problems considered in social surplus precision experiment.

Discussion. For the Harker and Friesz network (HF CEP), which is a very small network, the relative error in social surplus is in the same order of magnitude as the relative gap. Increasing the size a bit, to the Sioux Falls network with fixed demands (SFF CEP), the relative error is one order of magnitude more than the relative gap. Adding elasticity in SFE FB TP adds another order of magnitude to the error, and we have a relative error that is larger than two orders of magniute larger than the relative gap. The same applies to STHLM J1/J2 TP which also are subject to elastic demand. However, the largest network, ANA CEP, keeps a difference between relative error and relative gap by less than two orders of magnitude at worst. A trend can be observed: The larger the network is, the larger the relative error in social surplus, and the relative error can for our sizes of problems be more than two orders of magnitude higher than the relative gap. A possible reason why the relative error increases so much when introducing elasticity could be that negative difference in social surplus is used instead of user cost. User benefit (UB) and user cost (UC) (of which social surplus (SS) is the difference (SS = UB − UC)) is of the same order of magnitude, which gives rise to cancellation error when computing SS. Besides, ∆SS might at points close to the origin (the non-modified network) be of some order of magnitude less than SS. Both these factors causes the absolute value of ∆SS to be (perhaps orders of magnitude) smaller than UC. Nevertheless, these graphs give us a hint of what precision we have. This kind of tool can be used to see what one can expect in terms of precision of an optimal solution, and also help tuning the pa- rameters for optimization solvers for these problems. Table 10 shows what relation the plots in Figure 10 suggest between relative gap in traffic assignment solution and relative error in objective function value.

Problem rel.error(F ) = 10x × S HF CEP x ≈ 0 SFF CEP x ≈ 1 SFE FB TP x ≈ 2.5 STHLM J1 TP x ≈ 2 STHLM J2 TP x ≈ 2 ANA CEP x ≈ 1.5

Table 10: Relation between relative gap in TAP (S) and relative error in objective function value (rel.error(F )).

9.3 Evaluation of Rules for Defining Used Routes

1 2 3 In Section 7.2 two rules for defining the index sets Πk, Πk and Πk are discussed. This experiment investigates the practical difference between those two rules. The problems in Table 11 were solved using the LMBM-B with Rule A once, and Rule B and C for different values of fuzz. Termination criterion value L was chosen with guidance of the plots in Figure 10 and Table 10. The following data was recorded:

• The fuzz parameter (if applicable) (fuzz).

• Objective function value at final point (F ).

53 • Number of function evaluations required (NF ). The result is presented in Table 11.

Problem Rule (fuzz) FNF Problem Rule (fuzz) FNF A (-) 529.8093 26 A (-) 79.98865 134 B(10−3) 529.8093 26 B(10−3) 79.99430 322 B(10−5) 529.8093 26 B(10−5) 79.98885 88 B(10−7) 529.8093 26 B(10−7) 79.98878 117 B(10−9) 529.8093 26 B(10−9) 79.98865 105 HF CEP B(10−11) 529.8093 26 SFF CEP B(10−11) 79.98865 134 C(10−3) 529.8093 26 C(10−3) skipped skipped C(10−5) 529.8093 26 C(10−5) 79.98915 56 C(10−7) 529.8093 26 C(10−7) 79.98865 134 C(10−9) 529.8093 26 C(10−9) 79.98865 135 C(10−11) 529.8093 26 C(10−11) 79.98865 135 A (-) 13383.81 43 A (-) −258910.2 46 B(10−5) 13382.18 68 B(10−5) −260552.0 20 B(10−7) 13382.20 54 B(10−7) −260552.6 20 B(10−9) 13383.96 40 B(10−9) −259379.5 113 ANA CEP B(10−11) 13383.68 49 STHLM J2 TP B(10−11) −260014.9 114 C(10−5) 13383.81 43 C(10−5) −260552.0 20 C(10−7) 13383.81 43 C(10−7) −260552.6 20 C(10−9) 13383.81 43 C(10−9) −259379.5 113 C(10−11) 13383.81 43 C(10−11) −260014.9 114

Table 11: Results from rule tests.

Discussion. HF CEP is not at all affected by the choice of rule. This is because no nondifferentiability is encountered during the minimization process. Rule A, being parameter-less, performs quite well on all problems. For Rule B we have an interval between 10−5 and 10−7 that seem to work well on all problems. It should not be chosen too small (10−3 for SFF CEP) or too big (10−9 for STHLM J2 TP), because then NF increases. Rule C behaves similarly to Rule B. No significant conclusion can be drawn, −7 but as a rule for defining used routes for the remainder of this work, Rule B with fuzz = 10 will be chosen.

9.4 NFFM on Six-Hump Camel Function Here we investigate the properties of the filled function global optimization method. We consider the problem to minimize the Six-Hump Camel function, which means to solve the following two dimen- sional problem: 2 1 4 2 2 2 minimize (4 − 2.1x1 + 3 x1)x1 + x1x2 + (−4 + 4x2)x2, (x1,x2) (52) subject to −3 ≤ x1 ≤ 3,

−2 ≤ x2 ≤ 2. A two dimensional problem was selected in order to be able to plot the function and the filled function, with the thought that it can give a feeling of how the filled function method works. The objective function is coercive, so no penalty terms were added to the objective functions. Figure 11(a) shows a contour plot over the function in the feasible region. It has six local minima, and hence is a good subject for NFFM. We pick the point (1.3, −1) as starting point, because then we will not fall down into one of the actual global minimas at once. The filled function method starts by doing a local minimization with the local descent solver (SDBH). In Figure 11(b) the steps made by SDBH is plotted. Then the new minimizer is our best point this far and will be referred to as x∗, while the objective function value there will be called f ∗.

54 Then the filled function is constructed. As mentioned in Section 7.5.2 there is a parameter µ that is decreased gradually in an iterative process, and we have different filled functions for different values of µ. Figure 11(c) shows the first filled function iteration where µ = 1. After this step, we are still in the original basin. After µ has been decreased to 0.1, interesting things happen. In Figure 11(d) we can see how the filled function method finds one of the global minimizer basins, marked by A in the figure. On the contour lines we can see that there might have been a chance that we missed the new basin. The function values are decreasing to the left, and we could have ended up in the left minimizer, marked by B. That would not have been an improvement from the original point, and the new point would have be discarded. In C we can see how the method suffers form utilizing the steepest descent search direction generation combined with only the Armijo condition for line search. What’ll happen after the basin belonging to a global minimizer has been found, is that a local mini- mization of the original function will be started and the true global minimizer will be found. Then the algorithm will start over from there, with µ = 1 and a new filled function based on the new best point. Nothing better will be found, and the method will terminate. Let us assume that that basin was not found while µ = 0.1, i.e., that the descent method missed the chance of finding it (perhaps by taking a too long initial step.) The next value of µ is 0.01. Figure 11(e) shows how the filled funciton would look for this value of µ. It is basically a function with value equal to the distance between the point and x∗, except for the two spots where the function value of the original function is lower than f ∗. Also in the figure is the trajectories of three attempts of minimizing the filled function with three different initial points. The first two attempts fail miserably, while the third attempt actually finds its way into one of the basins. It is evident that this was just pure luck and happened because of the way we picked the initial points. This is an example of what was mention in Section 7.5.2 about the distance function dominating the original function.

Discussion. This experiment shows that the filled function method is very sensitive to the value of µ. Reducing µ by a factor of ten each iteration is maybe a little bit too much. Perhaps one should halve it instead to increase the chance of using a good value of µ. In the remaining experiments with NFFM, µ 1 has been set to 2 .

55 Six−Hump Camel function First local minimization of original function. 2 −0.6

1.5 −0.65

−0.7 1 −0.75 0.5 −0.8 2 2 x x 0 −0.85

−0.5 −0.9

−0.95 −1 −1 −1.5 −1.05 −2 −3 −2 −1 0 1 2 3 1.2 1.3 1.4 1.5 1.6 1.7 1.8 x 1 x1

(a) Six-Hump Camel function (b) Initial local minimization

First filled function iteration. µ = 1. Second filled function iteration. µ = 0.1 2 −0.74 C −0.75 1.5 A −0.76 1 −0.77

−0.78 0.5 2

−0.79 2 x

x 0 B −0.8 −0.5 −0.81

−0.82 −1

−0.83 −1.5 −0.84 −2 1.6 1.62 1.64 1.66 1.68 1.7 1.72 1.74 1.76 1.78 1.8 −3 −2 −1 0 1 2 3 x 1 x1 (c) First filled function minimization (d) Second filled function minimization

Hypothetical third filled function iteration. µ = 0.01 2

1.5

1

0.5 2 x 0 2

−0.5 3 1 −1

−1.5

−2 −3 −2 −1 0 1 2 3

x1 (e) Hypothetical third filled function minimization

Figure 11: Plots over (a) the Six-Hump Camel function, (b) the first local minimization of the Siz-Hump Camel function, (c) the first local minimization of the filled function, (d) the second local minimization of the filled function, (e) a hypothetical third local minimization of the filled function. In (b)-(e), the thicker lines and the squares correspond to the progress of the local solver. In (c)-(e) the diamond marks cur- rently best point.

56 9.5 DSDTAP on a Trivial Elastic Demand Problem

In this experiment we take a one-link network and equip it with an excess demand link for elastic de- mand according to the MNL model. In such a simple network we can compute the solution analytically and compare it with the numerical solution obtained by the implemented DSD solver. The purpose is to motivate that the adaption of the DSD solver for elastic demand by adding an excess demand link is correct. Consider the network in Figure 12. Denote by v the link flow on the regular link and let the

Excess demand link

Regular link

Figure 12: A one-link network augumented with an excess demand link. cost function of the same link be t = v.

Moreover, assume that there is a total travel demand of 10, which in the no-toll scenario is partitioned so that 5 take the car and 5 use public transportation, hence T = 10,A = K = 5. This means we have a no-toll scenario travel cost π0 = 5. The inverse demand function is, according to the MNL model,

1   A  T  D−1 = π0 + ln + ln − 1 . α K v

To determine α, we assume that if the travel cost is doubled, the demand will be halved, i.e.,

1   A  2T  2π0 = π0 + ln + ln − 1 , α K A which can be rewritten as 1   A  2T  α = ln + ln − 1 . π0 K A

This gives α ≈ 0.219722. Solving the no-toll scenario problem with the DSD solver gives the solution given in Table 12.

Link Flow Cost Regular 4.99999999999886 4.99999999999886 Excess demand 2.50000000000114 5.00000000000207

Table 12: Link flow and cost in the numerical solution of the no-toll scenario.

If we impose a toll of 7.5 in the regular link, we get the numerical solution given in Table 13.

Link Flow Cost Regular 2.50000000000101 10.000000000001 Excess demand 1.16025403784338 9.99999999999756

Table 13: Link flow and cost in the numerical solution of the 7.5 toll scenario.

From the tables we can see that in the numerical solution, when the cost is doubled, from 5 to 10, on the regular link, the travel demand is halved, from 5 to 2.5. This is in accordance with the construction of the problem, hence the DSD solver solves this problem correctly. The tolerance used in the DSD solver here was 10−7.

57 9.6 Time Complexity in Number of Scenarios of LMBM-B and CCA for Stochastic Extension Here we investigate the time complexity in number or scenarios (N) for using Approach I and method LMBM-B and Approach II and the CCA method. We use the problem SFF SCEP EXP (defined in Section 9.1.2) as test problem. LMBM-B was picked as solver for Approach II since it is the one theoretically best suited for the problem being a bundle method. The problem was solved using CCA with six discretization parameters N, and using LMBM-B with eight discretization parameters. The results from the time measurements are given in Table 14. Figure 13 shows the data from the tables plotted.

N Time [s] N Time [s] Solution time as a function of number of scenarios 105 1 61 1 47 LMBM−B 2 46 2 118 CCA

4 61 4 277 4 10 8 39 8 1314 16 103 16 4407

32 163 32 38797 103

64 400 Time [s] 128 610

102 (a) LMBM-B (b) CCA

Table 14: Number of scenarios and elapsed 101 1 10 100 time for (a) LMBM-B and (b) CCA. Number of scenarios (N)

Figure 13: The data from Table 14 plotted in a log-log plot.

Discussion. From Table 14 and Figure 14 we can conclude that Approach II using the CCA method at least has time complexity O(N 3) for large N, since the doubling of N from N = 16 to N = 32 leads to an increase in time by a factor of eight. For Approach I using LMBM-B, the table and figure indicates that the time complexity is linear.

9.7 Sioux Falls with Elastic Demand First-Best Toll Pricing Problem (SFE FB TP) In this experiment we investigate the three local solvers SDBH, SNOPT and LMBM-B and compare their performance and results when applied to the first-best toll pricing problem on Sioux Falls with elastic demands (SFE FB TP, see Section 9.1.3). First-best toll pricing is a good problem to try, since we can easily compute the exact solution to this problem (see Section 6.8) and compare it with the solutions obtained numerically. In this experiment, as objective function we use the difference in social surplus (F (τ) = ∆SS) from the no-toll scenario. The following table present the solvers and the non-default settings used for the different local solvers that we evaluate. The toll pricing problem was first solved with LMBM-B, SDBH and SNOPT and a toll vector solution τ˜ was retreived for each local solver. The number of objective function and gradient evaluations were recorded as well as final optimal objective value. No separate time measurements were recorded, since the time spent by the solver logic and computations is negligible compared to the time spent solving the traffic assignment problem that has to be done in every objective function evaluation. Instead NF can be used for comparing the performance of the methods. The quality of the solutions was also checked. The system optimal traffic assignment problem was SO solved and the system optimal link flows va were obtained. The marginal costs m = (ma)a on each link was computed according to SO SO ∂ta(va ) ma = va . ∂va

58 As mentioned in Section 6.8, τ˜ and m are not necessarily equal, even if the toll pricing problem was solved perfectly since the toll prices are only bound to be equal on route-level and not necessarily on link-level. Instead we can check how well the solutions satisfy ΛT m = ΛT τ˜. We compute the relative difference e = (er)r for all routes by performing the following division element-wise:

T Λ (m − τ˜) e = . 1 + ΛT m

The social cost for the system optimal solution was computed to −83826.5726. The following data was recorded in the experiment: • Objective function value at final point (F ).

• Number of function evaluations required (NF ).

• Number of gradient evaluations required (Ng). Table 15 lists the results from the runs and the solution quality check numbers.

SDBH SNOPT LMBM-B F −83739.6826 −83826.4737 −83826.5297 NF 496 194 397 Ng 99 194 397 maxr er 0.57 0.070 0.029

F (˜τ)−F (m) 1.0 × 10−3 1.2 × 10−6 5.1 × 10−7 F (m)

Table 15: Results from SDBH, SNOPT and LMBM-B on first-best TP

The table tells us LMBM-B found the best solution, which was marginally better than the one SNOPT provided in terms of objective function. In terms of tolls, LMBM-B had a maximum relative error of 2.9%. SDBH didn’t do at all as well as the two others, and had a maximum error of 57% in the toll solution. SNOPT required least functions evaluations to terminate. However, none of SNOPT and LMBM-B exited normally. For LMBM-B, the reason for the termination criterion not being met is probably that the gradient is numerically unreliable to the requested precision at the optimal point. And in the case of SNOPT, it is probably a combination of the unreliable gradient and the nonsmoothness of the objective function. For SDBH, the termination criterion simply checks for small progress and terminates. Since no check is made on the norm of the gradient or projected gradient, this is the normal termination reason. Figure 14 shows a plot comparing the progress of the solvers in terms of number of function evalu- ations.

Discussion. The termination criterion is a problem here. SNOPT terminates after a short while of no progress, but it terminates not because the optimality conditions could be satisfied, but because no progress could be made. LMBM-B continues for quite a while after its best function value has been found. The termination criterion setting “First tolerance for change in function values” (in Table 3) is probably set too low here. Both SNOPT and LMBM-B stops with a solution which is equal to the true minimum taking the error margins of the objective function into account. Interesting to note is that they are descending approximately equally fast. SDBH produces a solution with an error that is about three orders of magnitude larger than for the other two.

59 ae o h uso h oa ovr h olwn aawsrcre nteexperiment: the in recorded was data following The solver. local the of be runs to the verified for names been has which smallest [HB98]), second from The chosen. (originally 13. was the [JP07] it of from and one correct, 525.871 best the is not value is which in function [Chi05], value reported from no solution that best found the was However, for it value experiment [Chi05]). function by (from and 512.013 table are the is two in methods reported interval missing different value with, the is 13 variable function compare total one best for In to the value problem. result optimal and the this articles, a on two finding results the good For in for EDO. presented scanned algorithm were Chi05], using [JP07; [HB98] articles, in presented originally was values different with SNOPT length, Frequency. step Hessian maximum the initial external different of best with SDBH the points, with initial also different DI- and NFFM, other, solvers each global with the results with respective also their found. and 9.1.1) solution compare CCA, Section We see and CEP, EGO. LMBM-B (HF and SNOPT, network RECT SDBH, Friesz and solvers Harker local the the on with problem expansion capacity the solve We Optimization Global CEP) and (HF Local Problem Expansion Capacity 9.8.1 Friesz and Harker 9.8 14: Figure nr o t h aaeesdfeigfrom evaluations. differing function parameters objective The it. for entry of where results, bound the lists 17 Table al 6lsstesle o-eal etnswr sdb h ifrn ovr,adas defines • also and solvers, different the by • used were settings non-default solver the lists 16 Table which comparison, for presented is problem same the on JP07] with [Chi05; run in found was solution LMBM-B best settings: The different with twice run were I Approach from solvers local All • • • betv ucinvlea nlpit( point final at value function Objective ( point Final ubro trtos( iterations of Number ( required evaluations function of Number rahIsles h ouintm sruhypootoa otenme ffnto evaluations function of number the to proportional roughly is ( time required solution the solvers, I proach ( required Time lwycmae oteohr.SOTtriae utbfr 0 ucineautos u LMBM-B converges but SDBH evaluations, function beginning. SNOPT. the 200 than the and before solution in solver better just a fastest a terminates with for descending SNOPT terminates number one and others. iteration continues the the given is a to SNOPT at compared cost slowly cost. social social the optimal between difference system the shows plot The ρ 0 and to N 20 ρ F ρ default ). ). o htvral gives variable that for T .Qeto ak()idctsta otm esrmn a ae o h Ap- the For made. was measurement time no that indicates (?) mark Question ). stevleo variable a of value the is N Absolute objective function error I 10 10 10 10 10 10 10 10 o prahI ovrCCA. solver II Approach for ) −2 −1 0 1 2 3 4 5 ρ 0 0 steiiilvleo h xaso parameters expansion the of value initial the is 100 F ( ρ x default F 512 = ) ). Iteration number 200 N F 60 ρ r itdand listed are i o prahIsolvers. I Approach for ) . fisetyi ln ntetbe ri hr sno is there if or table, the in blank is entry its if 013 300 Instead . 400 LMBM−B SNOPT SDBH F 535 = . 500 F 227 ( ρ ) stebs osbeobjective possible best the is while , N F ρ , stenme of number the is ρ U steupper the is Note that F has been recalculated for the EDO column.

Setting Value LMBM-B (1) Initial Point All zeros (default) LMBM-B (2) Initial Point All equal to four SDBH (1) Initial Steplength Factor 8 (default) SDBH (2) Initial Steplength Factor 1 SNOPT (1) Hessian Frequency 10 (default) SNOPT (2) Hessian Frequency 0 DIRECT Maximum Function Evaluations 100000

Table 16: Non-default settings for solvers on HF CEP.

The relative duality gap of the CCA solution is 1.1 × 10−10 (see Section 4.4 for a definition of the relative duality gap.)

Discussion. All three Approach I local solvers can be configured to find two different local minima. In the case of LMBM-B, we change the initial point. For SDBH we change the initial maximum step length. For SNOPT, we change the Hessian Frequency-parameter (i.e., the frequency in iterations for which the Hessian approximation should be reset to the identity matrix). This experiment shows that such a simple problem as HF CEP is potential of having multiple local minima and that it is difficult to know how to configure the solvers to make them converge to the best possible one. It also shows that LMBM-B needs significantly lower number of objective function evaluations, and that its final objective function value is at least as good as for the others. This means either that it is more efficient for finding a local minumum, or that its stopping criterion is more efficient. CCA finds exactly the same point as LMBM-B (2), which is the best one known. Compared to the external source, the local solvers find a better point to a lower cost if configured properly. None of the global solvers find a better solution than the local solvers. NFFM seems to stick at the local minimum with F ≈ 530 and EGO also fails here. DIRECT comes quite close to the best point found by the local solvers. In the next section, an investigation of the reasons behind the failure of NFFM will be made. Here follows a discussion on a guess on why EGO fails. The reason is probably that there are too many dimensions that don’t contribute at all to making the objective function better. As can be seen in Table 17, there are many zeros in the solutions, and even EGO has found some of them. However, this kind of variables are bad for EGO, since it works with sampling the search space, doing global search by sampling far form already sampled points. Adding one variable to a problem, makes EGO have to spread its samples out a lot. In the best and second best known solution to this problem, there are 10 zeros out of 16 variables, and it’s the same zeros, indicating that adding these ten dimensions don’t contribute at all to making the objective function value better. This also means that there is only a subset of the whole search space that is interesting to look into. EGO makes global search by sampling far away from points already sampled, and now if the interesting subset has been sampled a few times, there will be other points that are in unexplored areas that will be more attractive than those in the small subset of interesting points.

9.8.2 Investigation of Failure of NFFM on HF CEP NFFM fails to find the better minimum (ρ∗) in the experiment in Section 9.8, despite the fact that the ∗ first found minimizer (ρ1) is close to it and they differ only in variable ρ2 (see Table 17). When k = 2, ∗ the start position for the local search of the filled function is ρ1 + δDe2, where e2 is the positive direction

61 Solver SDBH (1) SDBH (2) SNOPT (1) SNOPT (2) LMBM-B (1) LMBM-B (2) CCA

ρ0 0 0 0 0 0 4 0 ρU 20 20 20 20 20 20 20 ρdefault 0 0 0 0 0 0 0 ρ2 4.6067 0.4579 4.6144 0.3464 0.3463 4.6144 4.6144 ρ3 9.9133 9.8022 9.9102 9.9108 9.9104 9.9104 9.9104 ρ6 7.3667 8.1298 7.3749 7.3714 7.3738 7.3738 7.3738 ρ8 0.6134 0.5921 0.5936 0.5926 0.5923 0.5922 0.5922 ρ12 ρ14 1.2872 1.3178 1.3164 1.3171 1.3152 1.3152 1.3152 ρ15 ρ16 20.000 20.000 20.000 20.000 20.000 20.000 20.000 F 522.6445 529.9492 522.6439 529.8093 529.8093 522.6439 522.6439 NF 65 56 51 117 26 18 – NI –––––– 3 T ???? 3.2 s 2.4 s 3.2 s

Solver NFFM DIRECT EGO EDO

ρ0 0 –– 0 ρU 20 20 20 20 ρdefault 0 0.004572 0 0.005 ρ2 0.3536 4.6228 4.3743 4.616 ρ3 9.8808 9.8720 7.3736 12.341 ρ6 7.4937 7.4120 17.0593 7.659 ρ8 0.6173 0.5898 0.593 ρ12 2.8928 0.019 ρ14 1.3212 1.3123 0.4315 1.314 ρ15 16.1513 ρ16 20.000 19.9954 11.2042 19.995 F 529.8129 522.8408 571.6196 525.871 NF 10895 (16) 100000 118 (108) 466

Table 17: Results for local and global solvers on on HF CEP and external solution from EDO. The number in paran- thesis is the number of the iteration on which the best function value was found.

along variable ρ2 and δD is the NFFM disturbance constant. For some value of µ NFFM should be able ∗ ∗ to escape the basin corresponding to ρ1 and go to the basin for ρ . But why doesn’t NFFM encounter the basin of ρ∗? The following experiment and discussion will try to answer that question. ∗ ∗ The objective function was evaluated at a number of points in the interval between ρ1 and ρ (with some additional points outside the interval). The resulting plot is presented in Figure 15 together with plots of the filled function for of µ ∈ {1, 0.5, 0.25} on the same interval.

Discussion. As can be seen from the plots, the minimum originally at 1 is drifted to the right and ∗ ∗ ∗ probably leaves the original basin of ρ . All points between ρ and ρ1 are on the boundary of Ω, since many variables are optimally equal to zero (recall that all links are subject to a possible expansion). This ∗ is a problem when using the filled function method, since the third term of the filled function P(ρ, ρ1, µ) (45), which is supposed to make the basins of better optima attractive, is equal to zero at the boundary of the region. The third term is 1 min[0, max{f(ρ) − f(ρ∗), g (ρ), i = 1, . . . , m}], µ 1 i

which we can see is equal to zero when any gi(ρ) = 0. If our two points were in the interior of Ω, the minimum originally at ρ = 1 in the plots would not drift as far, since the third term would become ∗ increasingly negative as µ increased and it would dominate the distance term −kρ − ρ1k, which in our case is making the minimum drift to the right in the plots.

62 Plot of slice of F(x) between x* and x* Plot of slice of P(x,x* ,1) between x* and x* (µ = 1) 1 1 1 532 2

531 0 530 −2 529 −4 528

527 −6

526 −8

525 Filled function value Objective function value −10 524 −12 523

522 −14 0 1 0 1 Linear combination parameter between x* and x* Linear combination parameter between x* and x* 1 1 (a) Original function (b) Filled function with µ = 1

Plot of slice of P(x,x* ,0.5) between x* and x* (µ = 0.5) Plot of slice of P(x,x* ,0.25) between x* and x* (µ = 0.25) 1 1 1 1 2 0

−1 0 −2 −2 −3

−4 −4

−5 −6 Filled function value Filled function value −6 −8 −7

−10 −8 0 1 0 1 Linear combination parameter between x* and x* Linear combination parameter between x* and x* 1 1 (c) Filled function with µ = 0.5 (d) Filled function with µ = 0.25

Figure 15: NFFM on HF CEP: Plots of (a) original function, (b-d) filled functions between two found local minima of the original function. At ρ = 0 we have the first minima, and the second is at ρ = 1.

This investigation invites to doing another experiment where all variables at the boundaries are removed, i.e., we keep variables 2, 3, 6, 8 and 14 and fix variable 16 to 20 (see Table 17).

9.8.3 Global Optimization on HF 2 CEP This experiment is motivated by the experiment in Section 9.8.2. We take problem HF 2 CEP, i.e., where only links 2, 3, 6, 8 and 14 can be expanded, and expansion of link 16 is fixed to 20. We want to see whether or not NFFM and EGO can do any progress on this problem for which our known solutions are not on the boundary. All default settings for LMBM-B and NFFM are used (see Table 3 and 4). All default settings for EGO are used (see Table 5.) The following data was recorded in the NFFM experiment.

• Found local minima (ρ).

• Value of F at each local minimum.

• Values of µ and k used when each minimum was found.

• Number of function evaluations (NF ) required before each minimum was found, and also after the whole run was finished.

63 The results of the optimization using NFFM is presented in Table 18. Two local minima were found. The following data was recorded both in the NFFM and EGO experiment.

• Final point (ρ).

• Objective function value at final point (F ).

• Number of function evaluations required (NF ) to finish the run, as well as the number of function evaluation required before the best solution was found (within parentheses in the result table).

The data above is presented in Table 19.

Point no. 1 2 Finish

NF 17 51 1432 µ – 1 k – 2 F 542.6797 522.6439 Expansion pars. ρ2 4.6144 4.6144 ρ3 4.5268 9.9104 ρ6 15.5769 7.3738 ρ8 0.5922 0.5922 ρ14 1.3152 1.3153

Table 18: Results from NFFM with LMBM-B as subsolver on HF 2 CEP.

Solver NFFM EGO

ρ0 0 – ρU 20 20 ρ2 4.6144 4.6143 ρ3 9.9104 9.9080 ρ6 7.3738 7.3761 ρ8 0.5922 0.5851 ρ14 1.3153 1.3126 F 522.6439 522.6439 NF 1432 (51) 78 (73)

Table 19: Results for NFFM and EGO on HF 2 CEP

Discussion. For NFFM, the experiment indicates that there was a problem with the active constraints for NFFM in the experiment in Sections 9.8.1, since the best-known optimum is found using NFFM on HF 2 CEP in this experiment. EGO also performs very well on this problem now, indicating unnecessary variables are bad for EGO. This is of course related to the fact that EGO suffers from the curse of dimensionality.

9.8.4 Stochastic Optimization This section contains a series of experiments made on a stochastic version of the capacity expansion problem of the Harker and Friesz network (HF SCEP, see Section 9.1.1). A comparison of the conver- gence rates between the two in Section 6.7 presented discretization schemes: SAA (Monte Carlo) and Simpson’s rule, will be made. The comparison is performed both for the expectation value objective function ((SMPEC)N ) and conditional value-at-risk objective function ((SRPEC)N ). Also the solutions from both the expectation value problem and conditional value-at-risk problem will be illustrate by means of histograms. The two problems HF SCEP EXP and HF SCEP CVAR (β = 0.9) were discretized both with SAA and Simpson’s rule for values of N (number of scenarios) equal to 1, 2, 4, 8, 16, 32, 64, 128 and 256 for

64 SAA and 3, 5, 9, 17, 33, 65, 129, 257 and 513 for Simpson’s rule. For both problems (EXP and CVAR), the SAA discretization scheme were solved eight times for each discretization parameter N with different samples. For Simpson’s rule, since it chooses samples deterministically, the two problems were solved only once for each N. LMBM-B was the local solver, and as starting point, the best point from the local optimization runs in Section 9.8 was used (the one produced by LMBM-B (2) in Table 17). The settings for LMBM-B were the defaults. The following data was recorded for each run:

• Final stochastic objective function value (FˆN or GˆN ).

The results are presented in Table 20. As “exact” solutions, Fˆ513 and Gˆ513 from using Simpson’s rule were chosen. The absolute error for each problem, discretization scheme and N was computed using the above definition of exact solution and plotted in Figure 16. Also HF SCEP CVAR (β = 0.8) was solved, but without collecting information for checking the convergence. Histograms of the deterministic objective function for the solutions (from using Simpson’s rule with 513 scenarios) of HF SCEP EXP and HF SCEP CVAR with β = 0.8, β = 0.9 and β = 0.99 can be seen in Figure 17.

(a) EXP, SAA (b) EXP, Simpson’s rule (c) CVAR, SAA (d) CVAR, Simpson’s rule

N Average FˆN N FˆN N Average GˆN N GˆN 1 521.7859 3 528.35767771 1 523.4067 3 566.60979511 2 521.4253 5 528.11379573 2 535.8214 5 562.83512361 4 524.0500 9 528.09496629 4 547.0220 9 558.98304331 8 528.8611 17 528.09371135 8 560.0495 17 561.11543932 16 528.3606 33 528.09363155 16 559.7681 33 560.92153021 32 529.5556 65 528.09362636 32 561.0649 65 561.08191284 64 527.9716 129 528.09362602 64 560.2549 129 561.05458578 128 527.7230 257 528.09362611 128 561.1631 257 561.06419543 256 528.2271 513 528.09362616 256 561.5076 513 561.06318903

Table 20: Results in terms of objective function value for local minimization using LMBM-B of problem HF CEP EXP in (a) and (b) and HF CEP CVAR in (c) and (d) using discretization scheme SAA in (a) and (c) and Simpson’s rule in (b) and (d). Note that there actually were eight runs for each N for the SAA discretization scheme and the value presented in the tables is an average.

102 102

100 101

10−2 100

10−4 10−1 Absolute error Absolute error

10−6 10−2

10−8 10−3 100 101 102 103 100 101 102 103 N N (a) HF SCEP EXP (b) HF SCEP CVAR

Figure 16: Absolute error of solution for (a) HF SCEP EXP and (b) HF SCEP CVAR. The solid line connects the computed absolute errors for Simpson’s rule and the dashed line is for SAA.

65 90 90 EXP EXP 80 CVAR 80 CVAR

70 70

60 60

50 50

40 40 Frequency Frequency

30 30

20 20

10 10

0 0 500 510 520 530 540 550 560 570 580 500 510 520 530 540 550 560 570 580 F F (a) EXP/CVAR β = 0.8 (b) EXP/CVAR β = 0.9

90 EXP 80 CVAR

70

60

50

40 Frequency

30

20

10

0 500 510 520 530 540 550 560 570 580 F (c) EXP/CVAR β = 0.99

Figure 17: Histograms of the objective function value for the 513 scenarios from Simpson’s rule for solutions to HF SCEP EXP (thick edges) and HF SCEP CVAR (thin edges). In (a), histograms of the solutions to EXP and CVAR with β = 0.8 are presented, in (b), histograms of the solutions to EXP and CVAR with β = 0.9, and in (c), histograms of the solutions to EXP and CVAR with β = 0.99.

Discussion. From the plots in Figure 16 we can see that Simpson’s rule has a higher convergence rate than SAA. Here we have only one random variable, and the integrals in (SMPECΩ) and (SRPECΩ) are computed more accurately using a Gaussian quadrature rule than using a Monte Carlo simulation. From Figure 16(a), which shows the convergence rates for both discretization methods on HF SCEP EXP, − 1 −4 we can see that SAA and Simpson’s rule have converge rates O(N 2 ) and O(N ) respectively, which coincide well with the theory. Interesting to note in that figure is also that the curve for Simpson’s rule flattens out when reaching an absolute error of 10−7. The reason behind this is that we don’t have higher precision in the objective function than that. In Figure 16(b) (the experiment with HF SCEP CVAR) we − 1 have the same situation for SAA, with a convergence rate of O(N 2 ). However, for Simpson’s rule, the convergence rate is roughly O(N −2). This behavior can be derived from the fact that the integrand in (SRPECΩ) is nonsmooth, which makes the error term of Simpson’s rule unbounded.

Figure 17 shows that there really is a difference between the solution to the expectation value prob- lem and the conditional value-at-risk problem. For HF SCEP CVAR, there are fewer scenarios with higher cost, but in order to achieve that, the average cost has increased. A displacement of the proba- bility mass can also be seen between the solutions with β = 0.8 and β = 0.9. The solution with β = 0.99 doesn’t reduce the probability mass of the worst-case scenarios.

66 9.9 Sioux Falls with Fixed Demands Capacity Expansion Problem (SFF CEP) We compare the results and performance of the local solvers SDBH, SNOPT, LMBM-B and CCA and global solvers NFFM, DIRECT and EGO on the capacity expansion problem on Sioux Falls with fixed demand (SFF CEP, see Section 9.1.2). The results are also compared with a solution from the solver SBD in [Jos03]. Default settings were used for all solvers, with the exceptions given in Table 21.

Setting Value DIRECT Maximum Function Evaluations 10000

Table 21: Non-default settings for solvers on SFF CEP.

The following data was recorded in the experiment:

• Final point (ρ).

• Objective function value at final point (F ).

• Number of function evaluations required (NF ) for the Approach I solvers.

• Number of iterations required (NI ) for the Approach II solver CCA.

• Time required (T ). Question mark (?) indicates that no time measurement was made. For the Ap- proach I solvers, the solution time is roughly proportional to the number of function evaluations required (NF ).

The results are presented in Table 22.

Solver SDBH SNOPT LMBM-B SBD CCA NFFM DIRECT EGO

ρ0 0 0 0 0 0 0 –– ρU 25 25 25 25 25 25 25 25 ρ1 5.3250 5.3653 5.3392 5.3027 5.3503 5.3392 5.2984 5.6208 ρ2 2.0062 2.0302 2.0778 2.0560 2.0571 2.0778 2.1091 2.0144 ρ3 5.3566 5.3484 5.3664 5.3430 5.3766 5.3664 5.4012 5.6386 ρ4 1.9678 2.0285 2.0486 1.9901 2.0316 2.0486 2.0062 3.0628 ρ5 2.5489 2.4134 2.5015 2.5216 2.4703 2.5015 2.4177 2.4048 ρ6 2.6010 2.4918 2.5436 2.5548 2.5189 2.5436 2.6235 3.4114 ρ7 2.9195 2.9314 2.9288 2.9883 2.9289 2.9288 2.9321 3.7941 ρ8 4.8087 4.7826 4.7958 4.8559 4.8011 4.7958 4.7840 4.4831 ρ9 2.9804 2.9862 2.9889 3.0026 2.9882 2.9889 3.0350 2.9583 ρ10 4.7958 4.8277 4.7845 4.8496 4.7886 4.7845 4.7840 4.7345 F 79.9909 79.9891 79.9888 79.9961 79.9886 79.9888 79.9934 80.1936 NF 55 131 117 –– 17017 (117) 9949 (3184) 103 (101) NI –––– 89 ––– T ?? 64 s ? 92 s ? ? ?

Table 22: Results for local and global solvers on SFF CEP and external solution from SBD.

The relative duality gap of the CCA solution is 4.3 × 10−11 (see Section 4.4 for a definition of the relative duality gap.) EGO terminated after 103 function evaluations with the message that expected improvement was low for three iterations.

Discussion. The best solution comes from CCA, while SDBH produced its solution with least number of objective function evaluations. The best external result found came from the SBD algorithm, which is

67 a solver almost identical to SDBH. SNOPT gives an average solution but required most objective evalu- ations. The optimal points found don’t differ much, which together with the similar objective function values indicate they found the same minimum. NFFM uses LMBM-B as subsolver and hence finds the same minimum as LMBM-B, but can’t improve the solution during the 17000 additional objective func- tion evaluations made. DIRECT and EGO don’t find any better solution than those found by the local solvers.

9.10 Stockholm Toll Pricing Problem with Cordon J2 (STHLM J2 TP) 9.10.1 Local and Global Optimization We compare the results and performance of the local solvers SDBH, SNOPT, LMBM-B and global solvers NFFM, DIRECT and EGO on the toll pricing problem on Stockholm with elastic demand and us- ing cordon J2 (STHLM J2 TP, see Section 9.1.4), i.e., the cordon with two toll levels: one for the Northern toll gates and one for the Southern toll gates. Default settings were used for all solvers, with the exceptions given in Table 23.

Setting Value DIRECT Maximum Function Evaluations 1000

Table 23: Non-default settings for solvers on STHLM J2 TP.

For SNOPT and LMBM-B, the objective function and variables were scaled down to order of mag- nitude 1. This meant dividing the objective function value by 105 and the variables by 10. This had a large impact on the solution given by LMBM-B and made it generate a better solution. For NFFM, the order of magnitude of the variables were scaled down by a factor 10 and the objective function was scaled down by a factor of 103. The following data was recorded in the experiment: • Final point (τ). • Objective function value at final point (F ).

• Number of function evaluations required (NF ). The results are presented in Table 24. NFFM found in total two local minima. Table 25 presents the two minima found. A plot of the function values on the line between the two found minima is presented in Figure 18(a), where τ 1 is the first and τ 2 the second minimum found.

Solver SDBH SNOPT LMBM-B NFFM DIRECT EGO

τ0 15 15 15 0 –– τU 30 30 30 30 30 30 τ1 23.8035 23.8368 23.8400 23.8346 23.8203 23.5453 τ2 21.7004 21.7314 21.7354 21.7303 21.7215 21.4539 F −260552.17 −260552.59 −260552.57 −260553.08 −260551.15 −260514.49 NF 52 67 46 1309 (217) 999 (710) 200 (84)

Table 24: Results for SDBH, SNOPT and LMBM-B on STHLM J2 TP

Discussion. The results tell us that the three local methods find the same minimum in approximately the same number of function evaluations. The difference in objective function value between the so- lutions is within the error margins (see Table 9 and 10). Interesting to note is that the first minimum found by NFFM was with F ≈ −260031. It soon advanced to one with similar function value as the others. Figure 18 shows that there actually are two local minima close to each other. In the NFFM run, the initial point was the origin and not (15, 15). LMBM-B is the subsolver of NFFM here, and hence LMBM-B was stuck in the worse local minima τ 1 when the origin was used as initial point.

68 Point no. 1 2 Finish

NF 22 117 1309 µ – 1 k – 3 F −260031.38 −260553.08 Expansion pars. τ1 24.4919 23.8346 τ2 21.3864 21.7303

Table 25: Results from NFFM with LMBM-B as subsolver on STHLM J2 TP.

5 1 2 Plot of F(τ). Location of slice and two minima. x 10 Plot of slice of F(τ) between τ and τ −2.598 23

−2.599 22.5

−2.6 22 −2.601 τ2

2 1 −2.602 τ 21.5 τ

−2.603 21

Objective function value −2.604 20.5 −2.605

−2.606 20 0 1 22 22.5 23 23.5 24 24.5 25 25.5 26 1 2 τ Linear combination parameter between τ and τ 1 (a) Slice plot (b) Full contour plot

Figure 18: The STHLM J2 TP objective function. In (a), a slice plot for points between τ 1 and τ 2, where τ 1 corre- sponds to x = 0 and τ 2 to x = 1 in the plot. In (b) a contour plot of the objective function close to the two minima is shown. The two minima τ 1 and τ 2 are marked with squares, and the slice is marked by the line between them.

9.10.2 Stochastic Optimization Here we make stochastic optimization on the Stockholm network. The capacity on link Essingeleden (marked by an asterisk in the map in Figure 9) is in this problem subject to stochastic perturbations in capacity. The name of the problem is STHLM J2 STP EXP/CVAR, see Section 9.1.4 for a full description of the problem. Three values of β are considered: 0.8, 0.9 and 0.95. Based on the experiences from the experiment in Section 9.8.4 we will use Simpson’s rule again, since we have a single uniformly distributed random variable and we discretize with N = 65. Solving the traffic assignment problem of the elastic Stockholm network takes about 10 seconds for each scenario which in this case means about 10 minutes per objective function evaluation. This is the reason for the relatively low value of N. The value of N = 65 was used during optimization, while for producing the histograms we set N = 513. The problem was solved using LMBM-B with the default settings. The results from the optimization is presented in Table 26. The development of the objective function value as N increases from 3 to 65 is presented in the table to indicate the rate of convergence. Note that NFˆ and NGˆ is the number of evaluations required only for the last run with N = 65. This run had as initial point τ0 given in the tables. The results from STHLM J2 STP CVAR with β = 0.8 and β = 0.95 are only presented as histograms in Figure 19. For generating the histograms in Figure 19 of the objective function value distribution for EXP and CVAR, 513 scenarios were generated at the optimal points and objective function values were computed for each scenario.

Discussion. From the histograms in Figure 19 and the difference between the solutions to EXP and CVAR we can see that the minimization gives a solution in agreement with could be expected. On

69 (a) EXP (b) CVAR β = 0.8 (c) CVAR β = 0.9 (d) CVAR β = 0.95 N 65 N 65 N 65 N 65 τ0 20 τ0 (24.21, 21.39) τ0 20 τ0 (24.21, 21.39) γ0 −260000 γ0 −260000 γ0 −260000 τ1 23.6866 τ1 24.0628 τ1 24.2101 τ1 24.1388 τ2 21.7582 τ2 21.4115 τ2 21.3916 τ2 21.2242 γ −255998.8 γ −254231.7 γ −253116.0

Fˆ3 −259349.0 Gˆ3 −252046.3 Fˆ5 −259277.5 Gˆ5 −252766.8 Fˆ9 −259572.7 Gˆ9 −253535.6 Fˆ17 −259573.0 Gˆ17 −253083.1 Fˆ33 −259572.0 Gˆ33 −253121.0 Fˆ65 −259572.1 Gˆ65 −254090.5 Gˆ65 −253101.0 Gˆ65 −252578.0 NFˆ 34 NGˆ 14 NGˆ 61 NGˆ 66

Table 26: Results for LMBM-B on STHLM J2 STP EXP/CVAR.

70 70 EXP EXP 60 CVAR 60 CVAR

50 50

40 40

30 30 Frequency Frequency

20 20

10 10

0 0 −2.65 −2.6 −2.55 −2.5 −2.65 −2.6 −2.55 −2.5 5 5 F x 10 F x 10 (a) EXP and CVAR (β = 0.8) (b) EXP and CVAR (β = 0.9)

70 EXP 60 CVAR

50

40

30 Frequency

20

10

0 −2.65 −2.6 −2.55 −2.5 5 F x 10 (c) EXP and CVAR (β = 0.95)

Figure 19: Histograms of the objective function value for the 513 scenarios from Simpson’s rule for solutions to STHLM J2 STP EXP (thick edges) and STHLM J2 STP CVAR (thin edges). In (a), histograms of the solutions to EXP and CVAR with β = 0.8 are presented. In (b), histograms of the solutions to EXP and CVAR with β = 0.9 and in (c) histograms of solution to EXP and CVAR with β = 0.95.

70 the CVAR histograms we can identify a hump of probability mass that is “sacrificed” and displaced to higher objective function values to reduce the probability mass at the worst case scenarios. The number of function evaluations needed is about the same that is needed for the deterministic case. Interesting to note is also that the toll fee on the Northern gates (τ1) should be rised and the toll fee on the Southern gates should (τ2) should be lowered when minimizing the conditional value-at-risk objective instead of the expectation value objective.

9.11 Anaheim Capacity Expansion Problem (ANA CEP) We compare the results and performance of the local solvers SDBH, SNOPT, LMBM-B and CCA and global solvers NFFM, DIRECT and EGO on the capacity expansion problem on Anaheim with fixed demand (ANA CEP, see Section 9.1.5). Default settings were used for all solvers, with the exceptions given in Table 27. The local Approach I solvers were run twice with default settings, but with two initial points: Either default (all zeros) or all equal to 10.

Setting Value DIRECT Maximum Function Evaluations 10000

Table 27: Non-default settings for solvers on ANA CEP.

The following data was recorded in the experiment:

• Final point (ρ).

• Objective function value at final point (F ).

• Number of function evaluations required (NF ) for the Approach I solvers.

• Number of iterations required (NI ) for Approach II solver CCA.

• Time required (T ). Question mark (?) indicates that no time measurement was made. For the Ap- proach I solvers, the solution time is roughly proportional to the number of function evaluations required (NF ).

The objective function value of the network without expansions is 14199.14. The results from the experiment are presented in Table 28. The suffix (1) means the initial point was all zeros, and (2) means the initial point was all equal to 10. The relative duality gap of the CCA solution is 2.3 × 10−10 (see Section 4.4 for a definition of the relative duality gap.) From the results we can identify three potential local minima with F (ρ) ∈ {13919.11, 13919.37, 13926.29}. The first two might seem to be the same minima when only looking at the objective function value, since the difference is so small. However, when comparing ρ5 and ρ10 between the solutions, it is apparent that the two points are quite far from each other in terms of ρ. We define the three points ρ1, ρ2 and ρ3 as the solutions to SDBH (1), SDBH (2) and LMBM-B (1) respectively. To determine how the three solutions found relate to each other, the objective function was plotted on lines between ρ1 and the two other points. The plots are presented in Figure 20.

Discussion. We start with analyzing the three distinct solutions found. Figure 20 shows us that the three solutions really are three isolated local optima, even ρ1 and ρ2 are different minimas. LMBM-B (1) converged to ρ3 which has an objective value roughly 7 higher than the best one found. For our best solution, the reduction in social cost was about 280 (14199.14 − 13919.11), of which 7 is 2.5%. ρ1 and ρ2 can possibly be considered two considerably better solutions than ρ3. It’s difficult to draw any conclusions regarding the performance of the solvers. CCA is slower than the Approach I based local solvers on this problem. The plots in Figure 20 suggest that the objective function has quite many local minima, making it easy for a local solver to get stuck in one of them.

71 Solver SDBH (1) SDBH (2) SNOPT (1) SNOPT (2) LMBM-B (1) LMBM-B (2)

ρ0 0 10 0 10 0 10 ρU 20 20 20 20 20 20 ρ1 10.5252 10.4627 10.5263 10.4740 10.3743 10.4703 ρ2 6.1674 6.1680 6.1711 6.1457 3.2134 6.1618 ρ3 4.0772 4.1908 4.0778 4.1884 4.0880 4.1833 ρ4 4.9400 4.9269 4.9533 4.9156 4.9565 4.9247 ρ5 3.5703 2.7137 3.5621 2.7110 3.5966 2.7045 ρ6 3.9481 4.1144 3.9520 4.1094 3.9423 4.1011 ρ7 2.2255 2.1748 2.1883 2.2403 2.2131 2.1529 ρ8 3.1068 3.3010 3.1059 3.3020 3.2840 3.3377 ρ9 1.8938 1.8975 1.9002 1.8550 2.0477 1.8809 ρ10 3.2641 4.5938 3.2601 4.6415 2.8072 4.8502 F 13919.11 13919.37 13919.11 13919.37 13926.29 13919.38 NF 57 56 60 79 31 18 T ?? 353 s ? 260 s 130 s

Solver CCA NFFM DIRECT EGO

ρ0 0 0 –– ρU 20 20 20 20 ρ1 10.5258 10.5235 10.7407 10.6213 ρ2 6.1711 6.1506 6.2963 5.9314 ρ3 4.0762 4.0889 4.0741 3.8879 ρ4 4.9533 4.9535 4.8148 4.7654 ρ5 3.5604 3.5616 3.3333 2.4087 ρ6 3.9539 3.9517 4.0741 3.4084 ρ7 2.1850 2.1869 2.5926 2.1684 ρ8 3.1016 3.1064 3.3333 2.9799 ρ9 1.9043 1.9245 1.8519 1.4977 ρ10 3.2725 3.4023 3.3333 5.1653 F 13919.11 13919.11 13919.62 13920.05 NF – 5089 (84) 10251 (6636) 100 (96) NI 58 ––– T 1242 s ? ? ?

Table 28: Results for local and global solvers on ANA CEP.

NFFM, which is based on LMBM-B, finds the best point known after 84 function evaluations. Nei- ther DIRECT nor EGO finds a better point than those already found, indicating that the jungle of local minima that we are investigating is the one containing the global minimum. NFFM suffers heavily performancewise from doing an exhaustive search with different values of µ and k without gaining in solution quality.

72 4 4 x 10 Plot of slice of F(ρ) between ρ1 and ρ2 x 10 Plot of slice of F(ρ) between ρ1 and ρ3 1.392 1.3927

1.3919 1.3926

1.3919 1.3925

1.3919 1.3924

1.3919 1.3923

1.3919 1.3922

Objective function value 1.3919 Objective function value 1.3921

1.3919 1.392

1.3919 1.3919 0 1 0 1 Linear combination parameter between ρ1 and ρ2 Linear combination parameter between ρ1 and ρ3 (a) F (ρ) for ρ between ρ1 and ρ2 (b) F (ρ) for ρ between ρ1 and ρ3

Figure 20: The ANA CEP objective function values for points between (a) ρ1 and ρ2, and (b) ρ1 and ρ3. In both plots, ρ1 corresponds to x = 0 and ρ2 or ρ3 to x = 1.

9.12 Barcelona with Fixed Demand (BARC) We investigate the feasibility of solving a network design problem on the Barcelona network with fixed demand (BARC, see Section 9.1.6). For Approach I, solving the traffic assignment problem for the original setup (without any design modifications), takes approximately 10 minutes for a relative duality gap less than 10−7. It has also been noted that some setups (e.g., capacity expansions on some links) have made the convergence of the traffic assignment solver even slower. This network is comparable (in terms of network size) with the Chicago sketch network which is solved in less than eight seconds with TAPAS ([BGbl]). For Approach II, the network design problem is to determine the optimal capacity expansion on link 2077, where the expansion is constrained between 0 and 1, and the investment cost is 105 times the capacity expansion. When CCA is applied to this problem, after twenty-four hours CCA has advanced 24 iterations and the relative duality gap is 4.99 × 10−4. At this point the run was terminated. The vast majority of the time was spent in solving the master problem.

Discussion. For Approach I, using DSDTAP as traffic assignment solver doesn’t give accurate enough solution in reasonable time to be able to use it for solving a network design problem using Approach I. On the other hand, if substituting DSDTAP with a different traffic assignment solver that handles the Barcelona network (e.g., OBA or TAPAS) then Approach I could probably be used on this network. We also hit the wall for Approach II with this problem since it took twenty-four hours to reach a relative duality gap of 4.99 × 10−4 (which means the traffic assignment solution is of poor quality.)

10 Discussion

In this section the results of the numerical experiments are summarized and discussed. Also, a brief discussion on the theorems on uniqueness and strong regularity of networks with OD connectors and BPR functions is presented.

Do global methods find significantly better solutions than local methods? From the experiments in Sections 9.8 to 9.11 we can see that in general our local solvers (SDBH, SNOPT, LMBM-B, CCA) perform quite well on these problems compared to the global solvers. Starting with the origin or the midpoint as initial point for the local Approach I solvers, they often reach the best-known point. Sometimes they get stuck at some other local minimum, but it is seldom much worse than the best-known solution. The Approach II solver found for all three problems it was tested on the best-known solution with high accuracy.

73 On the other hand, the experiments show (especially for STHLM J2 TP and ANA CEP) that there might be many local solutions in the vicinity of the best-known solution to which the local solvers can converge. To avoid this in practice, one could use a global heuristic solver first to obtain a starting point for a local solver, which then can converge to a minimizer. Important to note is that this study was performed on four instances of the network design problem, and they all showed quite different characteristics. For example, there was no sign of being more than one local minimum for the SFF CEP problem, while ANA CAP and HF CEP evidently had many. It is therefore very difficult to say whether any results and conclusions about the solution quality from a local solver of these four problems can be generalized to other problem instances.

How can one solve stochsatic network design problems? In the experiment in Section 9.6 the time complexity in number of scenarios for Approach I and Approach II was measured. It was shown that Approach II suffered from a time complexity O(N 3), where N is the number of scenarios and Approach I only O(N), i.e., linear. Therefore, Approach II was discarded as a solver for the stochastic problem. The only previously published numerical solution of a stochastic network design problem of this kind is presented in [CP10b], where only a very small-scale problem is considered. In other words, there is no other source to compare our numerical results with. Instead, the solutions presented here can work as reference for future experiments on stochastic network design problems. From the discussion on the histograms in Sections 9.10.2 (HF SCEP) and 9.8.4 (STHLM J2 STP) we can at least conclude that Approach I with LMBM-B actually, in the case of CVAR, successfully reduces a measure of conditional value-at-risk. We only have one stochastic variable in our experiments, making Simpson’s rule much more efficient than Monte Carlo simulation. Regardless of choice of discretization scheme, the time complexity in number of scenarios for Approach I is linear. Not only the time for each objective function evaluation scales linearly, but also the full optimization procedure. Comparing the number of objective function evaluations needed for solving the deterministic problem STHLM J2 TP (Table 24) and the number of objective function evaluations needed for solving the stochastic problem STHLM J2 STP (Table 26), we can see that there is no increase in the case of expectation value as objective function. However, it has been noted that it is more difficult to solve the conditional value-at-risk problems, probably because only a few of the samples are included in the objective function (thanks to the max-operator in the integrand) and that a nondifferentiable point for the new variale γ is introduced. Additionally, the objective function values and gradients for the N scenarios are independent and can be computed in parallel, which is of great importance for practical and efficient implementations.

How would the tested solvers perform for large-scale networks? The largest network used in our experiments is Barcelona with 2522 links and in the experiment in Section 9.1.6 it was concluded that Approach I failed on this problem, because DSDTAP failed (or converged too slowly), and Approach II took very long time progressing on this problem. Despite these disappointing results, a discussion on the feasibility of the two approaches for large-scale problems follows. For the Approach I experiments, elapsed time for solving a problem has not been considered an important measure, because time measurements depends heavily on the implementation of the used algorithm and the computer hardware on which the experiment is performed. Instead, the number of objective function evaluations have been counted, since for Approach I the majority of the computation time is spent on solving the traffic assignment problem. From the result tables (Tables 17, 22, 24 and 28) of HF CEP, SFF CEP, STHLM J2 TP and ANA CEP we can see that the number of objective function evaluations always is of the order of magnitude 100 and seems to be independent of problem type or size. If this holds for larger networks (which is very probable, since the difference in size between HF CEP and ANA CEP is large) we can say that the traffic assignment solver determines the run time here. In our experiments, the DSD algorithm DSDTAP was used for solving the traffic assignment problem. However, there are better methods available. As an example can be mentioned the TAPAS algorithm ([BGbl]) which solves the Chicago sketch network (about the same size as the Barcelona network) to an average excess error of 10−10 in less than eight seconds with computers of today’s standard. Using this solver instead as traffic assignment solver should allow for solving large-scale network design problems. Further, thanks to the possibility to parallelize the scenario computations for the stochastic network design problem, one can solve large-

74 scale network design problems with stochastics on a cluster with some hundred nodes using Approach I. A major problem with Approach I is the lack of good termination criterion for local optimization as indicated in Section 9.7. Since we are dealing with a nonsmooth problem, it can’t be based directly on the norm of the gradient. Instead, bundle methods can be used, which certainly also relies on gradient data. It is hence important that the numerically computed gradient is accurate and because the gradient computation is based on the traffic assignment solution, it is important the traffic assignment solution is accurate. Close to nondifferentiable points it is likely that routes are misclassified when using any of the rules mentioned in Section 7.2 if the traffic assignment solution is inaccurate, and this might result in inaccurate numerical gradients. Approach II is a very different method; each iteration involves adding a constraint to a nonlinear program and solving that nonlinear program. In our implementation, SNOPT has been used as solver for that program. The time measurements in Tables 17, 22 and 28 for HF CEP, SFF CEP and ANA CEP are 3.2 s, 92 s and 1242 s respectively. Optimization on the Barcelona network didn’t finish after twenty- four hours. Also, the method doesn’t scale well for discretized stochastic problems when more scenarios are added. It is important to note that this is based on an implementation of the CCA algorithm using already existing general-purpose solvers. Specialization of this software towards our application could improve the efficiency of the algorithm.

11 Conclusions

Using an implicit programming approach to the network design problem has potential for being a feasible approach for local solution of both large-scale and stochastic network design problems since specialized software for fast and accurate solution of the traffic assignment problem exists (e.g., OBA, TAPAS ([BG02; BGbl])). For the stochastic problems, the implicit programming approach invites to intuitive and simple parallelization for computing the traffic assignment solutions of the stochastic scenarios. A drawback of the implicit programming approach is that very high accuracy of the traffic assignment solution is needed to give accurate numerical gradients. The cutting constraint method has worse time complexity properties both in network size and in number of stochastic scenarios. However, to make this method more attractive, perhaps the solver of the master problem can be specialized for our problem class, instead of using a general nonlinear pro- gramming solver (SNOPT in our case). The cutting constraint method produced very accurate results and always obtained the best known solution and hence showed good convergence properties.

References

[ATW07] M. Anitescu, P. Tseng, and S. J. Wright. “Elastic-Mode Algorithms for Mathematical Pro- grams with Equilibrium Constraints: Global Convergence and Stationarity Properties”. In: Mathematical Programming 110 (2007), pp. 337–371. [BG02] H. Bar-Gera. “Origin-Based Algorithm for the Traffic Assignment Problem”. In: Transporta- tion Science 36 (2002), pp. 398–417. [BG10] H. Bar-Gera. Transportation Network Test Problems. Mar. 2010. URL: http://www.bgu.ac. il/~bargera/tntp/. [BGbl] H. Bar-Gera. “Traffic Assignment by Paired Alternative Segments”. In: Transportation Re- search Part B: Methodological (to appear) (Published Online 2010). DOI: 10.1016/j.trb. 2009.11.004. [Big74] N. Biggs. Algebraic Graph Theory. Cambridge, UK: Cambridge University Press, 1974. [BLO05] J. V. Burke, A. S. Lewis, and M. L. Overton. “A Robust Gradient Sampling Algorithm for Nonsmooth, Nonconvex Optimization”. In: SIAM Journal of Optimization 15 (2005), pp. 751–779. [BMW56] M. Beckmann, C. B. McGuire, and C. B. Winsten. Studies in the Economics of Transportation. New Haven, CT, USA: Yale University Press, 1956.

75 [Bpr] Traffic Assignment Manual. Bureau of Public Roads. Washington D.C., USA 1964. [Chi05] S.-W. Chiou. “Bilevel Programming for the Continuous Transport Network Design Prob- lem”. In: Transportation Research Part B: Methodological 39 (2005), pp. 361–383. [Cla83] F. H. Clarke. Optimization and Nonsmooth Analysis. New York, NY, USA: John Wiley & Sons, 1983. [CP10a] C. Cromvik and M. Patriksson. “On the Robustness of Global Optima and Stationary Solu- tions to Stochastic Mathematical Programs with Equilibrium Constraints, Part 1: Theory”. In: Journal of Optimization Theory and Applications 144 (2010), pp. 461–478. [CP10b] C. Cromvik and M. Patriksson. “On the Robustness of Global Optima and Stationary Solu- tions to Stochastic Mathematical Programs with Equilibrium Constraints, Part 2: Applica- tions”. In: Journal of Optimization Theory and Applications 144 (2010), pp. 479–500. [CZ08] E. K. P. Chong and S. H. Zak. An introduction to optimization. Hoboken, NJ, USA: John Wiley & Sons, 2008. [DeM+05] V. DeMiguel et al. “A Two-Sided Relaxation Scheme for Mathematical Programs with Equi- librium Constraints”. In: SIAM Journal on Optimization 16 (2005), pp. 587–609. [Eks08] J. Ekström. “Designing Urban Road Congestion Charging Systems : Models and Heuris- tic Solution Approaches”. Lic. thesis. Department of Science and Technology, Linköping University, Sweden, 2008. [FL04] R. Fletcher and S. Leyffer. “Solving Mathematical Programs with Complementarity Con- straints as Nonlinear Programs”. In: Optimization Methods and Software 19 (2004), pp. 15–40. [Fle+06] R. Fletcher et al. “Local Convergence of SQP Methods for Mathematical Programs with Equilibrium Constraints”. In: SIAM Journal on Optimization 17 (2006), pp. 259–286. [Flo00] C. A. Floudas. Deterministic Global Optimization : Theory, Methods and Applications. Dor- drecht, The Netherlands: Kluwer Academic Publishers, 2000. [FW56] M. Frank and P. Wolfe. “An Algorithm for Quadratic Programming”. In: Naval Research Logistics Quarterly 3 (1956), pp. 95–110. [Ge90] R. Ge. “A Filled Function Method for Finding a Global Minimizer of a Function of Several Variables”. In: Mathematical Programming 46 (1990), pp. 191–204. [GMS05] P. E. Gill, W. Murray, and M. A. Saunders. “SNOPT: An SQP Algorithm for Large-Scale Constrained Optimization”. In: SIAM Review 47 (2005), pp. 99–131. [GMW86] P. E. Gill, W. Murray, and M. H. Wright. Practical Optimization. London, UK: Elsevier Aca- demic Press, 1986. [Haa04] M. Haarala. “Large-Scale Nonsmooth Optimization: Variable Metric Bundle Method with Limited Memory”. PhD thesis. Department of Mathematical Information Technology, Uni- versity of Jyväskylä, Finland, 2004. [Han92] E. R. Hansen. Global Optimization Using Interval Analysis. New York, NY, USA: Dekker, 1992. [HB98] H.-J. Huand and M. G. H. Bell. “Continuous equilibrium network design problem with elastic demand: Derivative-free solution methods”. In: Transportation Networks: Recent Methodological Advances. Ed. by M. G. H. Bell. Pergamon, Amsterdam, The Netherlands, 1998, pp. 175–193. [HF84] P. T. Harker and T. L. Friesz. “Bounding the Solution of the Continuous Equilibrium Net- work Design Problem”. In: Proceedings of the 9th International Symposium on Transportation and Traffic Theory. Ed. by J. Volmuller and R. Hamerslag. Utrecht, The Netherlands: VNU Science Press, 1984, pp. 233–252. [HMM07] N. Haarala, K. Miettinen, and M. M. Mäkelä. “Globally Convergent Limited Memory Bun- dle Method for Large-Scale Nonsmooth Optimization”. In: Mathematical Programming 109 (2007), pp. 181–205. [HY08] D. Han and H. Yang. “The Multi-Class, Multi-Criterion Traffic Equilibrium and the Effi- ciency of Congestion Pricing”. In: Transportation Research Part E 44 (2008), pp. 753–773.

76 [Jon95] S. Jonsson. Kort beskrivning av V/D-funktioner för tätortsgator och -vägar baserade på mätmateri- alet i TU71. Tech. rep. In Swedish. Regionplane- och trafikkontoret, Stockholm, 1995. [Jos03] M. Josefsson. Sensitivity Analysis of Traffic Equilibria. MSc thesis. Department of Mathemat- ics, Chalmers University of Technology, Sweden. 2003. [JP07] M. Josefsson and M. Patriksson. “Sensitivity Analysis of Separable Traffic Equilibria, with Application to Bilevel Optimization in Network Design”. In: Transportation Research Part B: Methodological 41 (2007), pp. 4–31. [JPS93] D. R. Jones, C. D. Perttunen, and B. E. Stuckman. “Lipschitzian Optimization with- out the Lipschitz Constant”. In: Journal of Optimization Theory and Applications 79 (1993), pp. 157–181. [JSW98] D. R. Jones, M. Schonlau, and W. Welch. “Efficient Global Optimization of Expensive Black- Box Functions”. In: Journal of Global Optimization 13 (1998), pp. 455–492. [Kar07] N. Karmitsa. LMBM - FORTRAN Subroutines for Large-Scale Nonsmooth Minimization: User’s Manual. Tech. rep. Turku Centre for Computer Science, Finland, 2007. [KDB09] A. Kadrani, J.-P. Dussault, and A. Benchakroun. “A New Regularization Scheme for Math- ematical Programs with Complementarity Constraints”. In: SIAM Journal on Optimization 20 (2009), pp. 78–103. [KMbl] N. Karmitsa and M. M. Mäkelä. “Limited Memory Bundle Method for Large Bound Con- strained Nonsmooth Optimization: Convergence Analysis”. In: Optimization Methods and Software (to appear). (Published Online 2009). DOI: 10.1080/10556780902842495. [LH04] S. Lawphongpanich and D. W. Hearn. “An MPEC Approach to Second-Best Toll Pricing”. In: Mathematical Programming 101 (2004), pp. 33–55. [LLCN06] S. Leyffer, G. Lopez-Calva, and J. Nocedal. “Interior Methods for Mathematical Programs with Complementarity Constraints”. In: SIAM Journal on Optimization 17 (2006), pp. 52–77. [LM85] A. V. Levy and A. Montalvo. “The Tunneling Algorithm for the Global Minimization of Functions”. In: SIAM Journal on Scientific and Statistical Computing 6 (1985), pp. 15–29. [LMP75] L. J. LeBlanc, E. K. Morlok, and W. P. Pierskalla. “An Efficient Approach to Solving the Road Network Equilibrium Traffic Assignment Problem”. In: Transportation Research 9 (1975), pp. 309–318. [LP92] T. Larsson and M. Patriksson. “Simplicial Decomposition with Disaggregated Representa- tion for the Traffic Assignment Problem”. In: Transportation Science 26 (1992), pp. 4–17. [LP98] T. Larsson and M. Patriksson. “Side Constrained Traffic Equilibrium Models - Traffic Man- agement Through Link Tolls”. In: Equilibrium And Advanced Transportation Modelling. Ed. by P. Marcotte and S. Nguyen. Dordrecht, The Netherlands: Kluwer Academic Press, 1998, pp. 125–151. [LS04] X. W. Liu and J. Sun. “Generalized Stationary Points and an Interior-Point Method for Mathematical Programs with Equilibrium Constraints”. In: Mathematical Programming 101 (2004), pp. 231–261. [McF70] D. McFadden. “Conditional Logit Analysis of Qualitative Choice Behaviour”. In: Frontiers in Econometrics. Ed. by P. Zaremba. New York, NY, USA: Academic Press, 1970, pp. 105–142. [MP07] P. Marcotte and M. Patriksson. “Traffic Equilibrium”. In: Handbooks in Operations Research and Management Science. Ed. by C. Barnhart and G. Laporte. Vol. 14. Amsterdam, The Netherlands: Elsevier Science, 2007, pp. 623–713. [OKZ98] J. Outrata, M. Kocvara, and J. Zowe. Nonsmooth Approach to Optimization Problems with Equi- librium Constraints. Dordrecht, The Netherlands: Kluwer Academic Publishers, 1998. [Pat04] M. Patriksson. “Sensitivity Analysis of Traffic Equilibria”. In: Transportation Science 38 (2004), pp. 258–281. [Pat08] M. Patriksson. “On the Applicability and Solution of Bilevel Optimization Models in Trans- portation Science: A Study on the Existence, Stability and Computation of Optimal Solu- tions to Stochastic Mathematical Programs with Equilibrium Constraints”. In: Transporta- tion Research Part B: Methodological 42 (2008), pp. 843 –860.

77 [Pat94] M. Patriksson. The Traffic Assignment Problem. Topics in Transportation. Utrecht, The Netherlands: VSP BV, 1994. [Ral08] D. Ralph. “Mathematical Programs with Complementarity Constraints in Traffic and Telecommunications Networks”. In: Philosophical Transactions. Mathematical, Physical and Engineering Sciences 366 (2008), pp. 1973–1987. [RB05] A. U. Raghunathan and L. T. Biegler. “An Interior Point Method for Mathematical Pro- grams with Complementarity Constraints (MPCCs)”. In: SIAM Journal on Optimization 15 (2005), pp. 720–750. [RDK88] G. Rose, M. S. Daskin, and F. S. Koppelman. “An Examination of Convergence Error in Equilibrium Traffic Assignment Models”. In: Transportation Research Part B: Methodological 22 (1988), pp. 261–274. [RU00] R. T. Rockafellar and S. Uryasev. “Optimization of conditional value-at-risk”. In: Journal of Risk 2 (2000), pp. 21–41. [SB98] C. P. Stephens and W. Baritompa. “Global Optimization Requires Global Information”. In: Journal of Optimization Theory and Applications 96 (1998), pp. 575–588. [Sha03] A. Shapiro. “Monte Carlo Sampling Methods”. In: Handbooks in Operations Research and Management Science. Ed. by A. Ruszczynski and A. Shapiro. Vol. 10. Amsterdam, The Netherlands: Elsevier Science, 2003, pp. 353–425. [She85] Y. Sheffi. Urban Transportation Networks: Equilibrium Analysis with Mathematical Programming Methods. Englewood Cliffs, NJ, USA: Prentice-Hall, Inc., 1985. [SLL02] J. G. Siek, L.-Q. Lee, and A. Lumsdaine. The Boost Graph Library: User Guide and Reference Manual. Boston, MA, USA: Addison-Wesley Professional, 2002. [SS00] H. Scheel and S. Scholtes. “Mathematical Programs with Complementarity Constraints: Stationarity, Optimality, and Sensitivity”. In: Mathematics of Operations Research 25 (2000), pp. 1–22. [YL09] Y. Yin and S. Lawphongpanich. “Alternative Marginal-Cost Pricing for Road Networks”. In: Netnomics 10 (2009), pp. 77–83. [ZXZ09] Y. Zhang, Y. Xu, and L. Zhang. “A Filled Function Method Applied to Nonsmooth Con- strained Global Optimization”. In: Journal of Computational and Applied Mathematics 232 (2009), pp. 415–426.

78 A Network data

A.1 Anaheim

Ext. group Included links 1 2, 102, 103, 104, 106, 107, 134, 135, 137, 286, 288, 289, 290, 291, 292, 293, 296, 297, 298, 299, 301, 302, 304, 305, 307, 308, 310, 311, 312, 782 2 86, 184, 185, 187, 188, 189, 190, 192, 193, 195, 196, 198, 199, 200, 201, 202, 203, 205, 206, 207, 209, 210, 808, 854, 856, 858, 859, 861, 862, 867 3 3, 108, 110, 111, 113, 114, 115, 116, 119, 120, 121, 218, 219, 220, 221, 223, 224, 226, 227, 228, 230, 231, 232, 234, 379, 382, 395, 551, 593, 594 4 4, 97, 169, 326, 328, 329, 331, 332, 333, 335, 336, 337, 338, 339, 341, 342, 344, 346, 347, 348, 349, 350, 351, 352, 353, 479, 527, 528, 621, 622 5 118, 123, 124, 125, 126, 127, 128, 130, 131, 132, 133, 150, 151, 152, 153, 154, 156, 157, 159, 214, 215, 217, 392, 417, 422, 423, 474, 475, 477, 478 6 1, 138, 139, 140, 141, 142, 143, 144, 147, 148, 170, 171, 172, 173, 175, 176, 179, 180, 181, 183, 428, 429, 431, 432, 480, 481, 482, 493, 580, 583 7 6, 88, 252, 253, 254, 255, 256, 257, 315, 316, 317, 318, 319, 320, 324, 365, 366, 367, 369, 370, 372, 373, 841, 881, 887, 890, 892, 894, 895, 898 8 136, 258, 260, 262, 263, 264, 265, 266, 267, 269, 270, 271, 272, 273, 274, 275, 276, 279, 280, 281, 282, 284, 285, 434, 440, 645, 726, 786, 791, 794 9 7, 313, 314, 321, 322, 323, 327, 375, 376, 377, 378, 811, 815, 819, 820, 821, 823, 828, 829, 832, 836, 837, 838, 840, 851, 869, 874, 877, 879, 906 10 92, 98, 101, 161, 162, 163, 164, 166, 167, 354, 356, 357, 358, 359, 360, 361, 362, 363, 364, 467, 507, 573, 612, 616, 654, 655, 666, 670, 703, 707

Table 29: Links included in the ten expansion groups for ANA CEP.

i ci i ci i ci i ci i ci 1 1.0 2 0.9 3 0.8 4 0.7 5 0.6 6 0.5 7 0.4 8 0.3 9 0.2 10 0.1

Table 30: Investment cost constants for ANA CEP.

79