This article was downloaded by: [128.173.125.76] On: 21 February 2014, At: 12:09 Publisher: Institute for Operations Research and the Management Sciences (INFORMS) INFORMS is located in Maryland, USA

INFORMS Journal on Computing

Publication details, including instructions for authors and subscription information: http://pubsonline.informs.org An Effective Deflected Subgradient Optimization Scheme for Implementing for Large-Scale Airline Crew Scheduling Problems

Shivaram Subramanian, Hanif D. Sherali,

To cite this article: Shivaram Subramanian, Hanif D. Sherali, (2008) An Effective Deflected Subgradient Optimization Scheme for Implementing Column Generation for Large-Scale Airline Crew Scheduling Problems. INFORMS Journal on Computing 20(4):565-578. http:// dx.doi.org/10.1287/ijoc.1080.0267

Full terms and conditions of use: http://pubsonline.informs.org/page/terms-and-conditions

This article may be used only for the purposes of research, teaching, and/or private study. Commercial use or systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisher approval. For more information, contact [email protected]. The Publisher does not warrant or guarantee the article’s accuracy, completeness, merchantability, fitness for a particular purpose, or non-infringement. Descriptions of, or references to, products or publications, or inclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, or support of claims made of that product, publication, or service. Copyright © 2008, INFORMS

Please scroll down for article—it is on subsequent pages

INFORMS is the largest professional society in the world for professionals in the fields of operations research, management science, and analytics. For more information on INFORMS, its publications, membership, or meetings visit http://www.informs.org INFORMS Journal on Computing informs Vol. 20, No. 4, Fall 2008, pp. 565–578 ® issn 1091-9856 eissn 1526-5528 08 2004 0565 doi 10.1287/ijoc.1080.0267 © 2008 INFORMS

AnEffective Deflected SubgradientOptimization Scheme for Implementing Column Generation for Large-Scale Airline Crew Scheduling Problems

Shivaram Subramanian Enterprise Optimization, United Airlines (WHQKB), Elk Grove Village, Illinois 60007, [email protected] Hanif D. Sherali Grado Department of Industrial and Systems Engineering, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061, [email protected]

e present a new deflected subgradient scheme for generating good quality dual solutions for linear pro- Wgramming (LP) problems and utilize this within the context of large-scale airline crew planning problems that arise inpractice. The motivationfor the developmentof this method came from the failure of a black-box- type approach implemented at United Airlines for solving such problems using column generation in concert with a commercial LP solver, where the software was observed to stall while yet remote from optimality. We identify a phenomenon called dual noise to explain this stalling behavior and present an analysis of the desirable properties of dual solutions in this context. The proposed deflected subgradient approach has been embed- ded within the crew pairing solver at United Airlines and tested using historical data sets. Our computational experience suggests a strong correlation between the dual noise phenomenon and the quality of the final solu- tion produced, as well as with the accompanying algorithmic performance. Although we observed that our deflected subgradientscheme yielded anaverage speed-up factor of 10 for the columngenerationschemeover the commercial solver, the average reductioninthe optimality gap over the same numberof iterationswas better by a factor of 26, alongwith anaverage reductioninthe dual noiseby a factor of 30. The results from the column generation implementation suggest that significant benefits can be obtained by using the deflected subgradient-based scheme instead of a black-box-type or standard solver approach to solve the intermediate linearprograms that arise withinthe columngenerationscheme. Key words: airline crew planning; set-partitioning problems; deflected subgradient approach; dual noise; variable target value method History: Accepted by William Cook, former Area Editor for Design and Analysis of Algorithms; received February 2006; revised April 2007; accepted November 2007. Published online in Articles in Advance May 15, 2008.

1. Introduction of some r inequality restrictions are side constraints Consider the following discrete optimization prob- that arise out of certain designated goals pertaining lem, which represents a set-partitioning problem with to alternative multiobjective requirements, beyond the side constraints (SPP-SC) (see Nemhauser and Wolsey cost-minimization objective of SPP-SC. 1999): A typical airline planning instance of SPP-SC is the multiobjective airline crew pairing optimization prob- SPP-SC: Minimize cT x lem, where the columns Aj of A are generated to sat- subject to Ax = e isfy delineated crew pairing legality restrictions. More (1) Dx ≤ d specifically, this problem seeks anoptimal assign- ment of crew pairings to different flight segments, x ∈ Bn

Downloaded from informs.org by [128.173.125.76] on 21 February 2014, at 12:09 . For personal use only, all rights reserved. subject to various contractual and FAA-mandated Here, the variables x ∈ Bn represent binary decision crew scheduling regulations. In practice, airlines are variables, with anassociated cost vector c. The first set interested in finding a (near-optimal) solution that of equality constraints in problem SPP-SC represent would satisfy certain utilization, manpower availabil- ity, and quality-of-work-life (QWL) requirements such set-partitioning constraints with the coefficients aij of the m × n matrix A being either zero or one, depend- as schedule regularity, inadditionto minimizingcrew

ing on whether variable xj participates in constraint i costs that include hotel, per diem, and traditional met- or not, and where e is a vector of ones. The second set rics such as flight-time-credit (FTC), and a variety of

565 Subramanian and Sherali: Effective Deflected Subgradient Optimization Scheme 566 INFORMS Journal on Computing 20(4), pp. 565–578, © 2008 INFORMS

other segment and duty-based penalties and incen- group of Boeing, Inc. reports results that demonstrate tives. Vance et al. (1997) provide more details on these their considerable success in solving massively sized crew pairing costs, and Klabjan et al. (2001, 2002) railroad crew scheduling problems using column gen- consider such crew pairing problems with additional erationtechniques(Kohl 2003). regularity and time-window restrictions, as well as Given the nature of the problem and various other include some aspects related to the aircraft-routing business requirements, a sequential planning process problem. has been traditionally followed, starting with a gross- We will restrict the scope of our study inthis paper day approximationwhere every flight is assumed to to the pure set-partitioning problem SPP, whose struc- operate every day, and ending with a final solution ture poses the mainchallengeinsolvinglarge-scale that represents crew pairings that cover each of the instances to near optimality. The current analysis can flight segments by date. However, recent advances be readily extended to deal with the more restricted in computer hardware and new column generation and realistic model SPP-SC. In practice, to main- algorithms and techniques have facilitated obviat- tain feasibility, we add nonnegative slack variables s, ing traditional sequential and suboptimal crew plan- along with a corresponding large positive undercov- ning approaches, and to directly solve the multiob- erage penalty P to the objective function, where P is jective crew scheduling problem in its entirety within typically chosento be considerablylarger thanthe a reasonable amount of computational time. Further- cost of any legal pairing that can be generated for more, this success has enabled the solution of large the givenproblem. This results inthe followingaug- optimizationproblems that arise from integratedair- mented formulation of SPP: line planning models that cross traditional boundaries SPP: Minimize cT x + Ps between different airline business areas. The problem instances we consider in this paper subject to Ax + s = e (2) are from weekly and monthly crew pairing prob- s ≥ 0x∈ Bn lems that pertain to United Airlines’s flight sched- ules. These are large-scale SPP instances that directly We now present a brief summary of relevant crew address a typical week’s abstracted problem or the pairing concepts used in crew scheduling applica- scheduling problem for an entire month. The lat- tions. A crew duty represents an approximate day’s ter choice is made whena large numberof dated worth of work and consists of a legal sequence of exceptions are present in the schedule, which usu- flight segments and flight connections. Layovers repre- ally occurs during holiday seasons or during months sent rest periods between successive duty schedules that contain several scheduling changes. In fact, the and these accrue both crew and hotel costs. A crew intensely competitive nature of the airline industry pairing is a sequence of crew duties, along with any has typically resulted inanincrease inthe frequency intermediate layovers, which begins and ends at the of such demand-driven dynamic scheduling changes same crew domicile. Duties and layovers have to to optimally match supply and demand. Typically, the satisfy a variety of FAA-mandated and contractual set of crew rules, such as “8-in-24” restrictions (no number of flight segments in such data sets range more than eight hours of flying time in any rolling from 5,000–15,000 after applying preprocessing tech- 24-hour window), night-flying limitations, and max- niques. As an additional complication, we also allow imum duty limits. For a more detailed discussionof limited “dead-heading,” wherein crew members are crew duties, crew rules, connections, and layovers, we transported between stations, usually on commercial refer the reader to Vance et al. (1997) and the FAA flights, allowing for more connection options in gen- website (http://www.faa.gov). erating feasible pairings. In the problem instances we Typically, the universe of crew pairings is huge solve, the segments from the entire United Airlines’s andmay runinthe order of trillionsfor evenmod- flight schedule for the month are available as poten- est problem sizes. A standard solution approach for tial candidates for dead-heading. Note that the dead- crew scheduling problems is to first use a column head segments for a crew pairing do not participate generation-based procedure to solve a linear pro- in the coverage constraints, but only affect the associ-

Downloaded from informs.org by [128.173.125.76] on 21 February 2014, at 12:09 . For personal use only, all rights reserved. gramming relaxation of SPP and to then continue ated cost coefficient. solving the underlying discrete optimization problem Another aspect of problem SPP in practice is that using branch-and-price techniques (Barnhart et al. the size of the problem alone does not determine the 1998) along with follow-on fixing strategies (Vance difficulty of the problem. The resulting nonlinearity et al. 1997). Makri and Klabjan (2004) describe some and structure of crew rules, the cost-objective struc- of the computational challenges associated with col- ture, the trade-off betweencompetingobjectives, the umngenerationapproaches for solvingpractical crew sparsity of the flight segment network, peaking effects scheduling applications. Also, the Carmen Systems due to the hub-and-spoke airline structure, among Subramanian and Sherali: Effective Deflected Subgradient Optimization Scheme INFORMS Journal on Computing 20(4), pp. 565–578, © 2008 INFORMS 567

other problem aspects, can greatly influence the frac- Another practical limitation with subgradient-based tionality of the relaxations, run times, and storage schemes proposed inthe literature is the numberof requirements. algorithmic parameters that need to be fine-tuned for the sake of an industrial implementation. Typically, 1.1. Motivation for Our Study the maintenance and updating of such parameter- Inthe columngeneration(CG)process for solving intensive methods diminishes over the years, leading such large-scale crew pairing optimization problems, to a drop insolutionquality, which oftenleads prac- we frequently observed the phenomenon of stalling titioners to revert to suboptimal schemes based on that practically results inthe early terminationofthe off-the-shelf LP solvers. CG scheme, without finding even a feasible solution, The main contributions of this work are to pro- whenused inconjunctionwitha commercial bar- vide insights into the desirable characteristics of rier or simplex-based (LP) solver. a dual solutionthat canovercome the observed The stalling typically originates early in the solu- stalling problem alluded above, and to develop tionprocedure, whenthe LP solutionafter the first an efficient deflected subgradient-based optimization few CG iterations turns out to be integral or nearly strategy having relatively few parameters that can rea- integral. After such an iteration, due to the inherent sonablyachieve such a dual solutioninpractice. degeneracy in the problem, the LP objective does not The remainder of this paper is organized as fol- improve, eventhough the subproblem quite easily lows. Section 2 presents the overall CG scheme and generates tens of thousands of negative reduced-cost discusses the stalling phenomenon in greater detail. columns. There have been several papers published Section3 describes a procedure for solvingthe master inthe literature that deal with schemes to accelerate program. Sample computational results are presented CG techniques for SPP instances, but we have not in §4, and §5 concludes the paper with a summary found any work that analyzes the properties of the and directions for future research. optimal dual solutioninthis context.Amongrelated research, Kohl (2003) presents computational results 2. Column Generation Scheme that demonstrate the efficacy of using subgradient- Inlieu of solvingproblem SPP directly, we first solve based approaches incommercial crew scheduling a continuous relaxation, R-SPP, of the problem using applications. Hu and Johnson (1999) address stalling a CG technique (see Nemhauser and Woolsey 1999) and convergence issues with their CG scheme for air- and then use a branch-and-price heuristic technique line crew pairing problems and propose an enhanced to find an acceptable integer solution to problem SPP. subproblem approach that seeks to generate an Inthis CG approach for solvingR-SPP, the master improved dual solutionat every iteration.Merle et al. problem, RMP, is a linear program that is a restriction (1999) propose a CG stabilizationscheme to over- of R-SPP, where only a subset of up to some n

Downloaded from informs.org by [128.173.125.76] on 21 February 2014, at 12:09 . For personal use only, all rights reserved. lem. Elhallaoui et al. (2005) suggest a dynamic con- that attempts to generate up to some p ≥ 1 pairings l T straint aggregation method that reduces the number that have the lowest reduced costs, cj −   Aj , of set-partitioning constraints and present results in where each column Aj of A satisfies all the mandated the context of urban mass-transit scheduling. Amor crew pairing legality requirements. If no (or few) such et al. (2006) attempt to generate deep dual-optimal negative reduced columns exist, the CG procedure cuts for improving the effectiveness of their CG pro- terminates. Otherwise, the new columns are added cedure, and test this approach on the cutting-stock to the restricted master program, and RMP is reopti- problem. mized to find a new dual vector. The model RMP that Subramanian and Sherali: Effective Deflected Subgradient Optimization Scheme 568 INFORMS Journal on Computing 20(4), pp. 565–578, © 2008 INFORMS

results at the end of the CG procedure is chosen as the this situation by analyzing RD, the dual to problem root node problem for solving the discrete optimiza- RMP: tion problem SPP. A branching scheme that guaran- m tees convergence for 0–1 set-partitioning problems is RD: Maximize i the “follow-on” fixing technique based on a modified i=1 Ryan-Foster’s rule (Vance et al. 1997). A branch-and- m subject to a  ≤ c ∀ j = 1 n price framework is typically used to solve SPP, where ij i j (4) the pricing module again utilizes the CG scheme to i=1 find good integer solutions to SPP. The overall scheme i ≤ P ∀ i = 1 m of the CG procedure is described below.  unrestricted Step 1. Set l = 1, and si = 1andi = P ∀ i = 1 m. Step 2. Solve the subproblem SP(l) to generate up Note that RD is a relaxationof the dual problem to some p ≥ 1 negative reduced cost columns and add associated with the unrestricted master program that includes all possible columns because only a subset of these to RMP. If fewer thansome pmin such columns are generated, terminate the procedure; otherwise, all possible dual constraints are considered here. Let proceed to Step 3. us define the set Sj to be the index set of the rows of SPP that contain the variable x . Because x¯ is anopti- Step 3. Set l ← l + 1. Solve RMP, possibly only to j mal solutionto RMP, anydual-optimal solution ∗ to near optimality, to generate a good quality dual solu- RD is characterized by the following linear system, tion l.If l − l−1 /m< , for some tolerance  denoted ∗J:  > 0, or if l = l , a specified maximum number of  max iterations, terminate the CG procedure, and if neces- ∗ i = cj ∀ j ∈ J sary, re-solve RMP to optimality to find an optimal i∈Sj primal-dual solution using any suitable LP solver, and ∗ ≤ c ∀ j  J (5) enter the branch-and-price routine. Otherwise, go to i j i∈S Step 2. j ∗ i ≤ P ∀ i = 1 m 2.1. Dual Noise Phenomenon in As an illustrative example, consider the following Column Generation four-segment SPP that arises from a gross-day crew Consider a large instance of SPP in the absence of pairing problem, where the rows are covered after the the dead-heading option. In this case, every feasi- first columngenerationiteration.Suppose that this ble pairing will have to cover at least two segments. iteration adds exactly two pairings containing the first Now, consider a CG scheme that produces an integer- two and last two segments, respectively, with iden- feasible LP solution x¯ after a few iterations. Let us tical high costs of P/2. Therefore, the set of dual- assume that all the slack variables take values of optimal solutions to RMP at this iteration satisfies zero in this solution, or equivalently, assume that we ∗ ∗ have preprocessed the input data set to eliminate 1 + 2 = P/2 (6a) all uncoverable flight segments from consideration to ∗ + ∗ = P/2 (6b) ensure the existence of such an x¯ for the resulting 3 4 ∗ SPP instance. Such an integral LP solution to this i ≤ P i= 1 4 (6c) problem is also anextreme pointLP solutionhav- One possible solution that satisfies these conditions is ing at most m/2 basic variables at unity (because givenby each Aj -column has at least two nonzero entries), and therefore, at least m/2 basic variables are equal to ∗ ∗ 1 = 3 = P/4 +  (6d) zero. This situation represents a highly degenerate ∗ = ∗ = P/4 −  (6e) extreme point, and we shall show that the quality 2 4 of the corresponding dual-optimal solution and the where

Downloaded from informs.org by [128.173.125.76] on 21 February 2014, at 12:09 . For personal use only, all rights reserved. choice of the subproblem methodology for generat- 3P 3P ing columns can greatly inhibit any improvement in − ≤  ≤ (6f) the LP objective in subsequent iterations of the CG 4 4 scheme. ∗ ∗ Inparticular, setting  = 3P/4 gives us 1 = 3 = P, Toward this end, let J represent the index set of ∗ ∗ and 2 = 4 =−P/2, which is anextreme pointdual- the unit integer-valued variables in an optimal solu- optimal solution. Suppose now that we proceed to the tionto Problem RMP, where J ≤m/2. Let us exam- next step of solving the subproblem with this dual

ine the characteristics of the dual solution under solution and generate two new pairings x3 and x4, Subramanian and Sherali: Effective Deflected Subgradient Optimization Scheme INFORMS Journal on Computing 20(4), pp. 565–578, © 2008 INFORMS 569

with the first pairing having a cost of c3 and cover- with a negative reduced cost at the end of the labeling ing segments 1 and 3, and the second pairing having heuristic.

a cost of c4 and covering segments 2 and 4. Hence, This example illustrates the potential pitfalls of a x3 will have a reduced cost of c3 −2P,andx4 will have black-box-type applicationof a commercial LP solver a reduced cost of c4 + P. This implies that any pair- withinanimplementationofsuch a dynamicCG pro- ing of the type x3 will have a negative reduced cost cedure. We term this phenomenon dual noise, whereby and will enter the matrix as long as c3 < 2P, a large certain inordinately high dual component magnitudes number, whereas x4 will not enter the matrix unless dominate the reduced-cost estimate of a partial pair- c4 is highly negative (< − P), which is almost unreal- ing, rendering the original objective cost component izable inpractice. Inparticular, c3 = P and c4 = 0 will insignificant in comparison and resulting in poor result in the interesting situation where x3 enters the columns being generated. Note also that in the pres- matrix, leading to no improvement in the objective ence of significantly high dual noise, the subprob- (although it cuts off the current optimal dual solu- lem scheme would need to process a significantly tion). Let us assume that there exists an additional higher number of partial pairings to add any mean-

legal pairing x5 having a cost of zero that covers all ingful subset of columns to the matrix that might the segments in the sequence 2, 4, 1, and 3. The partial lead to animprovementinthe objective value. Fur-

path of x5 containing its first two segments resem- thermore, as explainedinthe sequel, the additionof bles x4 and is a candidate that is likely to be elimi- columns driven by dual noise tends to generate weak nated by the heuristic CG subproblem schemes that inequalitiesinthe dual space, which results ina near- employ labeling-based resource-constrained shortest- stationary dual solution as well. path methods inpractice (see Desrosiers et al. 1995), For the large crew scheduling problems we tackle because after the stage of generating the first two seg- here, the cardinality of J is typically close to m/5, ments, it will have accumulated an unfavorably high leading to substantially higher degeneracy and dual reduced-cost estimate. noise. In such instances that we solved for United Figure 1 illustrates the highly nonlinear fluctuation Airlines, the black-box solver approach had to be inreduced-cost estimates of partial pairingsobserved abandoned due to a persistent stalling of the objec- withinthe subproblem heuristic associated with a his- tive value. Furthermore, ina typical dynamicCG torical United Airlines weekly crew scheduling data implementation, whenever we reach the preset col- set. The figure shows the variationinthe reduced-cost umn storage limit, columns having high reduced costs estimate after each label extension that adds another are replaced by those that are newly generated. Doing flight segment to the pairing. Thicker lines delineate so creates further problems because the presence of pairings that exhibit extreme variations. Each of the dual noise clouds the quality of columns, render- pairingsshowninthe figure successfully priced out ing such column replacement schemes ineffective.

500

400

300

200

100

0 Estimated reduced cost

–100 Downloaded from informs.org by [128.173.125.76] on 21 February 2014, at 12:09 . For personal use only, all rights reserved.

–200

–300 1234567891011121314 Segments in partial pairing

Figure 1 Progress in the Reduced-Cost Estimate of Partial Pairings Within the Subproblem Heuristic Subramanian and Sherali: Effective Deflected Subgradient Optimization Scheme 570 INFORMS Journal on Computing 20(4), pp. 565–578, © 2008 INFORMS

Inpractice, the reductionof dual noiseallows us to choose to findanoptimal dual solutionthat among significantly reduce the preset storage limit, without alternative optima, minimizes  2; i.e., we could anyadverse impact onthe quality of the solutionto consider solving SPP. In general, interior point dual solutions seem MDN: Minimize  2 to better suit columngenerationschemesinthe con- (7) text of SPP whencompared with extreme pointsolu- subject to  ∈ ∗ tions obtained via the simplex method. Furthermore, dual solutions obtained via subgradient optimization Problem MDN may be burdensome to use in practice. procedures have yielded better results inlarge SPP Instead, we can find a near-optimal solution to the instances when compared with those obtained using a following dual quadratic program, DQP, which is the commercial optimizationsoftware package. A similar dual to RMP that is augmented by a quadratic term, conclusion has been independently observed in other where ! is a small positive penalty on the dual norm: commercial crew pairing applications on the basis of ! DQP: Maximize eT  −  2 several empirical tests (Hjorring 2004) and in vehicle- 2 routing and staffing problems (Gunluk et al. 2004). subject to AT  ≤ c From the previous example, it is clear that not (8) all dual-optimal solutions are equally desirable. For  ≤ Pe example, a choice of  = 0 inthe foregoingillustra-  unrestricted tionwould have resulted inall componentsof the dual solutiontakinga positive value of P/4. This We canalso thereby measure the dual noisefor a choice significantly improves the possibility of gener- selected dual solution ∗ using the dual noise factor ", ating better columns via the subsequent subproblem givenby and, consequently, achieving a nonzero improvement ∗ " =  (9) in the LP objective. Interestingly, this dual solution ∗ min also has the smallest Euclidean norm among all alter- ∗ where min is a dual-optimal solutionhavingthe native dual-optimal solutions. (For brevity, “norm” smallest Euclideannorm. We expect that a subprob- and · will denote the Euclidean norm throughout lem heuristic that uses any dual-optimal (or near- the paper.) A key observationis that, inpractice, we optimal) solutionhavinga relatively small value of " would like to work with dual solutions that have to generate negative reduced-cost columns will result norms of low magnitude. It is our experience that inrelatively strongcuts inthe dual space. minimal norm dual solutions lead to reduced dual Note that problem DQP is a concave QP maximiza- noise. This can be intuitively explained as follows. We tionproblem. Inlieu of problem DQP, we canequiv- would like to improve the effectiveness of the sub- alently examine its Dorn dual formulation to obtain problem heuristic by actively seeking dual solutions the following QP extension of RMP (see Bazaraa et al. that do not dominate the reduced-cost estimates of 2006, for example): partial pairings, and thereby limit the occurrence of ! large partial reduced-cost variations over the labeling MQP: Minimize cT x + PeT s + zT z steps. Moreover, we would like to generate columns 2 that lead to deep cuts in the dual space. Denoting ∗ subject to Ax + s + !z = e (10) as the set of optimal dual solutions to a current RMP, note that we would ultimately like to find a  ∈ ∗ xs ≥ 0zunrestricted such that for some j, we obtaina negativereduced Observe that problem RMP canbe viewed as a special cost c − T A = c −  . Generating such a neg- j j j i∈Sj i case of MQP with ! = 0. Solving MQP using some ative reduced-cost column and adding  ≤ c to i∈Sj i j positive ! is equivalent to adding a perturbation !z the dual problem would yield a dual cut. We would to the constraints in (3) and including a quadratic like to cut off as much of the current ∗ as possible, penalty term in the objective that limits such a per- and therefore, we would like to find an enterable col- turbation. This form is related to ideas that attempt to umnindex j for which max ∗ c −  <0; i.e., stabilize column generation. Merle et al. (1999) pro- ∈ j i∈Sj i Downloaded from informs.org by [128.173.125.76] on 21 February 2014, at 12:09 . For personal use only, all rights reserved. c −min ∗  <0. Hence, noting that the com- pose an alternative perturbation scheme and penal- j ∈ i∈Sj i ponents of  tend to be nonnegative at optimality due ize the rectilinear norm of the perturbation vector. to the covering (≥) tendency of the set-partitioning Likewise, Barnes et al. (2002) propose a nonnegative constraints, if we use a  ∈ ∗ having relatively small least squares (NNLS) problem that aims to minimize components that yet generates a negative reduced- a perturbationmeasure over the set of active con- cost column, then this will likely yield a strong cut straints. Also, interestingly, Gunluk et al. (2004) report giving a dual ascent. Noting that the dual objec- animprovementinthe performanceof their CG proce- tive function is to maximize eT , we could therefore dure for a vehicle-routing problem by using a heuristic Subramanian and Sherali: Effective Deflected Subgradient Optimization Scheme INFORMS Journal on Computing 20(4), pp. 565–578, © 2008 INFORMS 571

discounting scheme that effectively reduces the mag- 3.1. Dual-Norm-Based Deflected Subgradient nitude of the optimal dual vector. Method (DSG) Here, we choose to solve the Lagrangian duals to Consider the following linear program LP, which rep- RMP and MQP using a new deflected subgradient resents RMP in convenient form, and its correspond- optimizationmethod that is presentedbelow. Note ing Lagrangian Dual LD: that the optimality conditions for MQP with respect to LP: Minimize cT x the z-variable columnrequires that z = , so that sim- ilar to the case of RMP, we obtaina Lagrangiandual subject to Ax = b (11) subproblem that involves only the (x, s)-variables. The x ∈ X deflected subgradient scheme that we propose not only yields near-optimal dual solutions to RMP and where MQP having a small Euclidean norm but is also com- putationally inexpensive when compared with a com- X = x& 0 ≤ x ≤ e mercial LP solver applied to solving problem RMP. LD: Maximize '&  unrestricted  (12) Moreover, onanaverage, we obtaina reductionof more than75% indual norm values whencompared where ' is givenby the optimal value to the with this commercial solver as we shall see in§4. We Lagrangian subproblem: combine this deflected with an T T  efficient implementation of the resource-constrained LS: Minimize c x +  b − Ax& x ∈ X (13) shortest-path approach for solving the column gen- The method we propose is animplicit variable tar- eration subproblem, which manages dual noise in an get value approach that canbe used to quickly gener- efficient manner without significantly increasing the ate good-quality solutions to LD in practice. Our goal computational burden. is to construct a sequence of incumbent dual vectors by using the product of the Euclidean norms of the incumbent dual vector and the current search direc- 3. A Deflected Subgradient Method tionto assess animplicit target value for determin- for Solving Linear Programs ing the step size for generating the next dual vector, In this section, we present some promising subgradi- where the search directionis derived usinga suitable ent-based methods for solving the Lagrangian duals convex combination of the current subgradient and to problems MQP and RMP. These approaches are the previous direction. We provide some motivation nondifferentiable optimization (NDO) methods that for this choice of step length below. involve moving along deflected subgradient (DSG) 3.1.1. Choice of Step Length. Given a nondiffer- directions to derive near-optimal, minimum-norm entiable concave function ', and a search direc- dual solutions to RMP and MQP in the context of col- tionvector d belonging to the subdifferential at , umn generation. we obtain the following inequalities by the concavity Baker and Sheasby (1999) have shown that by property of ' and using the Cauchy-Schwarz inequal- applying exponential smoothing to the subgradient ity, where ∗ solves LD: vector, the convergence process can be accelerated. The volume algorithm (VA) of Barahona and Anbil '∗ − ' ≤ ∗ − T d ≤  ∗ +   d (14) (2000) is based onanefficientapproximationscheme For any dual solution , let us define an implicit target similar to Baker and Sheasby’s (1999) method of expo- value difference nential smoothing, and uses moving averages to gen- (T = T − ' (15) erate anestimate of the primal vector alongwith a heuristic dual solution. However, the VA in its where T is an implicit target value at any particular original form does not guarantee primal or dual con- iteration, with the restriction that it cannot be lesser vergence, but could be modified to obtain dual con- thansome 2 vergence. For example, Sherali and Lim (2004) have Tmin = ' + )T d (16)

Downloaded from informs.org by [128.173.125.76] on 21 February 2014, at 12:09 . For personal use only, all rights reserved. enhanced the VA concept by viewing it as a special for a given T > 0. Let the implicit target value be case of the deflected subgradient approach in the dual such that (T = )  d , so that space, and have accordingly embedded the VA dual update scheme in the convergent variable-target value T ≡ ' + )  d and method (VTVM) developed by Sherali et al. (2000) to T ≥ T ⇒ )  ≥2 d  (17) obtain encouraging results. In §4, we provide some min T comparative computational results using this scheme where ) is a factor prescribed below depending on as well. two possibilities that canarise. Subramanian and Sherali: Effective Deflected Subgradient Optimization Scheme 572 INFORMS Journal on Computing 20(4), pp. 565–578, © 2008 INFORMS

Case 1. ∗ ≤  . Inthis case, (14) yields '∗ − and Choi 1996). Any of these schemes could be used ' ≤ 2  d , and noting (17), we therefore obtain inthe above method. We shall adopt anexponential animplicit target value T that is anupper boundon smoothing scheme that appears to empirically accel- the optimal objective value by selecting ) ≥ 2. erate dual convergence, with the added simplification Case 2.  < ∗ . This implies from (14) that of simply choosing the step-length parameter + itself '∗ − ' ≤ 2 ∗ d . Depending on the current as the coefficient for the smoothing scheme. Under

dual norm, two subcases arise. this approach, the direction dt is derived as a convex Case 2a. 2  d ≤'∗−' ≤ 2 ∗ d . Inthis combination of the current subgradient direction and situation, by (17) a choice of ) ≤ 2 yields animplicit the previous search directionvector, as givenbelow: target value T that is a lower bound on the optimal d = +g + 1 − +d  (22) Lagrangian dual value. t t t−1 Case 2b. '∗ − ' ≤ 2  d ≤2 ∗ d . This where gt is a subgradient of ' at t, which is ob- situationis similar to Case 1, so again,a choice of tained by solving the Lagrangian subproblem given ) ≥ 2 results inanimplicit target value T that is an by (13), and + is the current step-length parameter. upper bound on the optimal objective. Letting xt be anoptimal solutionto LS( t), we canuse Motivated by these cases, we select a value of ) = 2 gt = b − Axt. Observe that dt is essentially a deflected to construct our implicit target value as prompted by subgradientdirectiongivenby 1/+dt = gt + dt−1, (17) according to where ≡ 1 − +/+. The resulting expression is of a similar general type to the alternative specific proce- T ≡ ' + 2  dt at iteration t dures used in Baker and Sheasby (1999) and Barahona with T ≡ ' + 2 d 2 (18) and Anbil (2000). min T t Note that if we use any convergent variable-target- based scheme (such as that inthe VTVM method where  and dt (to be specified) are, respectively, the incumbent dual vector and the search direction at iter- of Sherali et al. 2000, for example), the resultant ation t. The prescribed dual update scheme is then deflected subgradient methodology based on (22) givenby satisfies the requirement for convergence to a dual- optimal solution (Sherali and Lim 2004). Using the +,T − '- 2+  d self-correcting target scheme (18) in concert with (19),  = + d = + t d t+1 2 t 2 t dt dt (21), and (22) reduces the number of algorithmic 2+  parameters by at least two whencompared with the = + d  (19) VTVM scheme. d t t To summarize the proposed deflected subgradient method for optimizing LD, we start with an initial where 0 <+min ≤ + ≤ 1 is a suitable step-length parameter. This yields the following bounds on the dual incumbent  ≡ 0 as the iterate t at t = 1, and ¯ resulting norm of the dual solution by the Cauchy- define ' ≡ ' , set T = 1, and use d1 ≡ g1 as the Schwartz inequality: search direction.The dual solutionis thenupdated at any iteration t by taking a positive step length pro-  portional to the incumbent dual solution’s norm in 1 − 2+ ≤ t+1 ≤ 1 + 2+  the normalized DSG direction as stated below, where we have used (18), (19), and (21) to obtain (23a):  −  ⇒−2+ ≤ t+1 ≤ 2+ (20)      + 2+ dt if  ≥T dt and dt Starting with  = 0, t = 1, and d = g , the dual t+1 = (23a) 1 1 1  update (19) results in a sequence of dual solutions that  + 2+T dt otherwise. attempts to improve the Lagrangian dual objective

function '· by adopting a prescribed step of length T T T T 't+1 = min c x + t+1b − Ax = c xt+1 + t+1gt+1 2+  along the normalized DSG direction dt/ dt at x∈X iteration t. Note that whenever T is less than Tmin as where g ≡ b − Ax (23b) Downloaded from informs.org by [128.173.125.76] on 21 February 2014, at 12:09 . For personal use only, all rights reserved. t+1 t+1 defined in (18), we use Tmin inlieu of T in(19), giving the dual update scheme ¯ ¯ If 't+1 > ' thenset  = t+1 and ' = 't+1 (23c)  = + 2+ d (21) t+1 T t Let dt+1 = +gt+1 + 1 − +dt and reiterate (23d)

As far as the direction dt is concerned, several Inpractice, we also use the projected quadratic-fit deflectionschemes have beenprescribed inthe lit- during the first 1,000 iterations to accel- erature (see, for example, the discussioninSherali erate convergence, as described in Lim and Sherali Subramanian and Sherali: Effective Deflected Subgradient Optimization Scheme INFORMS Journal on Computing 20(4), pp. 565–578, © 2008 INFORMS 573

(2006). Note that the dual update in(23a) requires Convergenceto anoptimal solutioncanbe induced a step from the incumbent dual vector, unlike most by using the following two-phase approach. In other subgradient optimization schemes that adopt a phase I, the implicit target value DSG heuristic de- step length along the (deflected) subgradient direction I scribed above canbe runfor up to tmax iterations. from the most recent dual vector. We observed that We canthenswitch over to the convergentVTVM- the latter scheme, while being competitive in terms based target value algorithm of Lim and Sherali (2006) of the final solution gap, tends to produce relatively for phase II, using the same deflected subgradient- larger dual norms in practice. Furthermore, (20) spec- based direction. In this process, the implicit target ifies that the bounds on the norm of the next dual iter- value obtained at the end of phase I can be gainfully ate in the DSG scheme are proportional to the norm of employed as the initial target value for phase II. Given the dual vector from which we step. Toward the aim the heuristic nature of phase I, and the demonstrated of obtaining relatively smaller dual norms, especially effectiveness of the Lim and Sherali (2006) procedure, whensolvingProblem MQP, we canmodify (23a) in this phase II augmentation serves as a useful practical light of (20) by stepping from the most recent dual safeguard in attaining good quality dual solutions via vector  (and also defining T and T with  ←  t min t the overall two-phase scheme, besides simply ensur- in(18)) whenever  <  . This leads to the alter- t ing theoretical convergence. An alternative approach native update scheme: would be to use the dual estimate obtained at the end   of phase I to initialize the dual simplex method in  + 2+ d if  ≥ d and t T t phase II, in lieu of using a convergent NDO procedure dt t+1 = (23e)  such as VTVM.  + 2+T dt otherwise where  4. Computational Experience t if t <  and  = (23f) Inthis section,we provide sample results onfive   otherwise large-scale crew pairing SPP test sets that are derived from historical schedules at United Airlines. We also Inour proposed procedure usingeither (23a) or embed the proposed DSG scheme for RMP and MQP (23e) and (23f), the value of + is initially set to 1.0, withinACRUZER ©, United’s Parallel Crew Pairing and is periodically halved whenever the incumbent Solver, to evaluate its performance within the CG objective improvement falls below a threshold level as framework described in§2 usingfive other real-life shownbelow. Specifically, let  = 0 1 be a threshold ' proprietary test cases selected from a variety of fleet value for the ascent in incumbent objective achieved types and different operational crew regulations. In over the most recent = 10 iterations, below which min addition, we compare the performance of DSG to that the value of + is halved. The value of that represents of the interior point barrier solver and the dual sim- the next iteration to check for updating + is initial- ized at +1 = 11, the previous incumbent reference plex solver inCPLEX 9.1. Inthis context,we report min statistics onthe percentageoptimality gap after every value is initialized at 'ˆ = '¯, and the minimum per- −4 1,000 iterations of the DSG scheme and the CPU missible value for + is set at +min = 10 . Then, the scheme for managing + operates as follows, where runtime after 3,000 iterations,as well as the per- '¯ represents the current incumbent objective value at centage ratio of the DSG method’s final dual norm ¯ ˆ to the corresponding value obtained for the barrier iteration t.Ift = , thendo: If ' − '<' thenset + ← max + +/2 ; else, retain +. Also, reset 'ˆ ← '¯ solver of CPLEX 9.1. To focus purely onalgorith- min mic performance, all DSG runs were made with no and increment ← t + min. The DSG procedure canbe terminatedwhen prior knowledge of the optimal solution, no prepro- cessing of the problem or exploitation of inherent maxi=1 m dti <d, where d is some small positive tolerance or if we reach a specified maximum itera- network structures to reduce its size, and starting tionlimit. with a zero dual vector ineach case. For our imple- In lieu of directly solving the Lagrangian dual mentation, we have used a Linux-based Intel Pen- LD formulated for the linear program RMP, we can tium 4, 2.6 GHz computer having 1.5 GB of RAM, Downloaded from informs.org by [128.173.125.76] on 21 February 2014, at 12:09 . For personal use only, all rights reserved. alternatively solve a similar Lagrangian dual formu- and with serial processing only. The CPLEX barrier lated for MQP using the same approach, along with solver was applied to solve the five SPP examples to the modifications specified by (23e) and (23f), with- optimality using default parameter settings and with out any significant increase in computational effort no crossover to the simplex method. We also present required per iteration. We present some computa- results for the MQP-based approach and the conver- tional experience on applying the prescribed algorith- gent VTVM scheme. mic strategies to both LD and the Lagrangian dual to Table 1 presents statistics for the computational MQP in the next section. time and the solution value obtained by solving the Subramanian and Sherali: Effective Deflected Subgradient Optimization Scheme 574 INFORMS Journal on Computing 20(4), pp. 565–578, © 2008 INFORMS

Table 1 Computational Results Using CPLEX-Barrier to Solve the LP Table 3 Results for the DSG Solver Applied to the Relaxation of SPP to Optimality Lagrangian Dual of Problem MQP

Final CPU time MQP/LD dual CPU time (sec.) Data set Rows Columns objective Dual norm (sec.) Data set Gap-3,000 % norm % ratio for 3,000 iter.

CREW1 10683102 400 39337804 186889 141 CREW1 061 605330 CREW2 467359 255 6477734833504 167 CREW2 588 5484 18 CREW315 360 163373 53373424 658642 401 CREW35 89 9502 49 CREW4 15511 69989 48358712 723191 201 CREW4 127 7608 21 CREW5 2296 20334 942110231 935920 17 CREW5 907 9459 3

LP relaxationof SPP to optimality usingthe bar- Next, we present in Table 4 the performance of rier solver optionof CPLEX 9.1. Table 2 displays the convergent VTVM method using the generalized the results for solving the same LP relaxation via its Polyak-Kelley cut (GPKC) scheme, as described in Lagrangian dual using a stand-alone version of the Lim and Sherali (2006). The DSG method performs proposed DSG solver. We canobserve from Table 2 significantly better in terms of run time and solution that the DSG method found a dual solution to within quality for these crew planning instances, where the 10% of the optimal objective value after no more than goal is to quickly obtaingood dual solutions.Inall 3,000 iterations for all the data sets. Furthermore, the our test cases, the phase I solutionitself was withinan Euclidean norm of the DSG-generated dual vector is acceptable tolerance. However, to provide a practical significantly smaller than that generated by CPLEX safeguard, and to ensure theoretical convergence to (where the former is expressed as a percentage of the optimality (or near optimality), we recommend run- latter inthe penultimatecolumnof Table 2), being ning the two-phase DSG-VTVM scheme or the DSG- about 6%–43% of the latter value in magnitude, and dual simplex scheme described in§3. yieldinganaverage reductionfactor of 4.5 inthe Finally, Table 5 compares the performance of dual noise factor " (see Equation(9)). Furthermore, the ACRUZER© crew solver using CPLEX-barrier, as evident from Table 3, it is possible to generate CPLEX-dual simplex, and the DSG scheme applied smaller dual norms by solving the Lagrangian dual to to each of RMP and MQP. A relative complementar- MQP, where we have used a relatively small value of ity tolerance of 0.1 was used for the barrier method, ! = 10−7, and applied (23e) and (23f) instead of (23a) while the DSG scheme was run using a tolerance for MQP. Whencompared with Table 2, we obtaineda value of d = 0 05, andwith aniterationlimit of decrease inthe optimality gap for the largest CREW3 2,000. All approaches, except the barrier-based solver, instancebut with some degradationinsolutionqual- used an advanced start to initialize any CG iteration, ity for the other four cases. The last columnof Table 3 either using the most recent basis for the dual sim- indicatesa reductioninthe magnitudeof the dual plex method, or the most recent dual estimate for norm in all cases, yielding an average decrease of 24% the DSG approach. We have used a relatively large beyond the dual norms generated by solving LD. value of ! = 10−4 for the MQP-based scheme within Although greater reductions in the magnitude of the ACRUZER to aggressively seek dual solutions hav- dual norm may be possible by using larger values ing small norm values and thereby analyze the resul- for !, using disproportionately large values of this tant behavior of the DSG scheme under low dual parameter could lead to increasingly suboptimal solu- noise conditions. The statistics presented in Table 5 tions. We recommend solving the Lagrangian dual are derived from the ACRUZER runs made on a to Problem MQP whenever a significant reduction in representative set of large-scale SPP instances that dual noise is critical for avoiding stalling of the col- were encountered in historical scheduling months. umngenerationscheme.We report onsuch examples Although the data sets inthis sectionare proprietary, later inTable 5. a relatively simple way to generate synthetic SPP data sets is to extract flight schedule data from the Official

Table 2 Results for the DSG Solver Using an Iteration Limit of 3,000 Downloaded from informs.org by [128.173.125.76] on 21 February 2014, at 12:09 . For personal use only, all rights reserved. Table 4 Results for the VTVM-GPKC Scheme Gap-1,000 Gap-2,000 Gap-3,000 Dual norm CPU time (sec.) Data set % % % % ratio for 3,000 iter. Data set Gap-3,000 % CPU time (sec.)

CREW1 062 035027 864 30 CREW1 43489 CREW2 699 130057 1900 18 CREW2 2524 62 CREW38 37605 605 562 48 CREW314 86 162 CREW4 145 091 079 3495 21 CREW4 79369 CREW5 4828 1620 83842733 CREW5 5207 Subramanian and Sherali: Effective Deflected Subgradient Optimization Scheme INFORMS Journal on Computing 20(4), pp. 565–578, © 2008 INFORMS 575

Table 5 Comparative Results Within ACRUZER

Data set Flight segments CG iterations RMP solver Final LP objective value Euclidean dual norm CPU time (minutes)

PIL67 2000 10 BARRIER 2291459 2260587 4 SIMPLEX 2513306 2383343 14 DSG-RMP 2330865 560658 2 DSG-MQP 2311708 317958 2 PIL37 5000 20 BARRIER 17665559 3308198 298 SIMPLEX 12166390 3454063473 DSG-RMP 11444565 2115149 14 DSG-MQP 7927230 92107 16 FLTAT 16000 30 BARRIER 26663988 5071235 610 SIMPLEX 30629052 5286123644 DSG-RMP 13666756 4439346 20 DSG-MQP 10652293399 359 67 PIL20 10000 40 BARRIER 272271820 5686905 516 SIMPLEX 291986340 5849908 428 DSG-RMP 99237572 4387686 39 DSG-MQP 2313577 83969 54 STALL 5000 60 BARRIER 7003305 2506208 377 SIMPLEX 3257468 2752465 358 DSG-RMP 30871387 107 14 DSG-MQP 439265 89151 26

Airline Guide website (http://www.oag.com) for any problem increased in size and complexity, the DSG- particular week or month to generate the rows and based approach increasingly dominated the former incorporate the FAA crew regulations as well as any approach. The results from the last two data sets relevant contract-specific rules, if available, within the (PIL20 and STALL) indicate the drastic failure of the subproblem procedure to generate realistic columns. barrier- and simplex-based solvers in providing a The data sets are arranged in increasing order of dif- practically useful set of columns at the end of the pro- ficulty and represent different aircraft fleet types. All cedure. Furthermore, the black-box CG approach fre- examples except the FLTAT data set (which is based quently had to be abandoned in practice for virtually on the crew regulations for flight attendant schedul- all large data sets because the preset columnstorage ing), use pilot scheduling rules. Table 5 reports on limit was quickly reached and no significant improve- the problem size interms of the numberof flight ment in the objective was obtained beyond that point. segments, the number of CG iterations performed The DSG-based approach always yielded the small- withinACRUZER, the final(optimal) LP objective est norm, while the DSG-MQP combination, in partic- value achieved, the Euclidean dual norm obtained ular, generated dual solutions having a significantly smaller dual norm in all test cases with accompanying after the penultimate CG iteration (i.e., before reop- relatively better or competitive RMP formulations. As timizing the final RMP to optimality using CPLEX), far as computational expense is concerned, the pro- and the CPU run time for the CG scheme for each posed DSG-based methods outperformed the CPLEX of the four approaches. The final LP objective value CG schemes by a factor of 20 for the more difficult reported inColumn5 correspondsto the optimal LP data sets (all except PIL67). Note that while Table 2 solutionat the endof the CG procedure (computed implies anaverage speed-up factor of about 10 when by solving the LP relaxation of the final RMP using using 2,000 DSG iterations in a stand-alone implemen- CPLEX). The CPU runtime required to resolve the tation, an embedded DSG scheme takes advantage of continuous relaxation of the master problem to opti- anadvanced-startcapability from the previous itera- mality, as well as the runtime withinthe subproblem tion, thereby further improving the relative computa- heuristic across all iterations, are included in the CPU tional performance over several iterations.

Downloaded from informs.org by [128.173.125.76] on 21 February 2014, at 12:09 . For personal use only, all rights reserved. time reported in the final column. The last data set The improvement in the final solution quality (STALL) illustrates a real-life example inwhich the attained can be attributed to the ability of the DSG CG procedure based ona commercial solver stalls and scheme to limit dual noise, thereby generating rel- fails to reasonably converge after a large number of atively better columns at every CG iteration. This CG iterations. observationis supported by the strongcorrelation Although the performance of the CPLEX-based observed in virtually all instances between small dual solvers was somewhat competitive for the small- norms as reported inColumn6 of Table 5 andthe est data set (PIL67), we observed that as the SPP LP objective value realized as reported inColumn5. Subramanian and Sherali: Effective Deflected Subgradient Optimization Scheme 576 INFORMS Journal on Computing 20(4), pp. 565–578, © 2008 INFORMS

1,000,000,000

DSG-RMP DSG-MQP SIMPLEX BARRIER

100,000,000

10,000,000 Objective value

1,000,000

100,000 1357911131517192123252729313335373941434547495153555759 CG iteration

Figure 2 Progress in the Objective Value Within ACRUZER Across the CG Iterations for the STALL Data Set

Furthermore, whenwe used MQP to seek dual solu- performance difference between the CPLEX- and tions having small norms, we observed a signifi- DSG-based approaches. While the DSG-RMP scheme cantimprovementinsolutionquality inalmost all always converged to within the preset tolerance, the examples. Inaddition,the DSG-RMP solver required DSG-MQP approach terminated with relatively large about 30% lesser computational time on the aver- optimality gaps infour of the 60 CG iterationsdue age, due to the relatively faster convergence of the to the relatively large value of ! used inthese tests. RMP objective. Inpractice, to maintainagood bal- Consequently, we observe a pronounced nonmono- ance between computational performance and solu- tonic trend in '¯ and the dual norm for a few iter- tion quality, we recommend using MQP within the ations in Figures 2 and 3, respectively. However, as CG procedure and to gradually reduce the value of !, noted above, the objective value plotted after the final so that we approach a master program formulation iterationof the DSG-based schemes is the optimal LP that more closely resembles RMP toward the end of objective value obtained by reoptimizing RMP using the procedure. CPLEX, resulting in the final “jump” in Figure 2. (The Figure 2 compares the trend in the objective value LP values reported inTable 5 are these exact final across the CG iterations using each of the afore- optimal values.) The trend in these objective values mentioned four approaches for the STALL data indicates that the CPLEX-based methods exhibit a rel- Downloaded from informs.org by [128.173.125.76] on 21 February 2014, at 12:09 . For personal use only, all rights reserved. set (we display the primal objective value for the atively slow rate of improvement even after 60 CG CPLEX-based approaches, and '¯ for the DSG-based iterations. The DSG-RMP method converged after approaches except that at the final iteration of the 57 iterations, while the DSG-MQP method stabilized DSG procedures, we record the exact LP value after 25 iterations. Note that in the DSG-RMP case, the obtained via CPLEX), while Figure 3 compares the relatively rapid improvement in the Lagrangian objec- value of the dual norm at the corresponding CG iter- tive after about the 50th iterationinFigure 2 coin- ations. The objective values in Figure 2 are plotted cides with anequally rapid reductionindual norm ona logarithmic scale to highlight the substantial values inFigure 3. Furthermore, the CPLEX-based Subramanian and Sherali: Effective Deflected Subgradient Optimization Scheme INFORMS Journal on Computing 20(4), pp. 565–578, © 2008 INFORMS 577

7,000,000 DSG-RMP SIMPLEX BARRIER DSG-MQP 6,000,000

5,000,000

4,000,000

3,000,000 Euclidean dual norm

2,000,000

1,000,000

0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 CG iteration

Figure 3 Variations in the Euclidean Norm of the Dual Solution Across the CG Iterations Within ACRUZER for the STALL Data Set

approaches are characterized by relatively high dual planning applications at United Airlines as well, and noise throughout the procedure in Figure 3, which has yielded equally successful results. correlates with the slow improvement in the objec- tive function value and the relatively large gap from 5. Conclusions and Future Research optimality at termination as evident in Figure 2. We We have proposed inthis paper a newDSG scheme observed a similar correlationinthe generaltrends to efficiently generate good solutions to LP relax- of the objective values and the corresponding dual ations that arise in the context of a CG procedure for norms in all the other examples as well. solving large-scale crew scheduling problems in prac- Finally, we remark on the “production” robustness tice. In particular, we have identified a phenomenon and versatility of the DSG scheme presented in this called dual noise that causes a standard commercial- paper during industrial use. Proprietary versions of solver-based CG approach implemented at United the DSG scheme have beensuccessfully employed Airlines to stall far from optimality, and we have withinthe ACRUZER suite over the last four years demonstrated that the proposed DSG scheme is able and have yielded significant crew cost savings (esti- to considerably mitigate the effects of the dual noise mated at $9.6 million per year and recording the low- phenomenon when used to solve the intermediate lin-

Downloaded from informs.org by [128.173.125.76] on 21 February 2014, at 12:09 . For personal use only, all rights reserved. est FTC value in history at United Airlines after its ear programs that arise withinthe CG scheme. The initial implementation), accompanied by an equally DSG scheme possesses several attractive features in significant improvement in quality-of-work-life met- comparisonwith other subgradientmethods such as rics, as well as a reduction in the crew planning the use of a self-correcting target value, a minimal cycle time at United Airlines. These successes have number of algorithmic parameters that need to be resulted inthe proposed DSG scheme beingaddi- tuned, and empirically observed accelerated conver- tionally deployed without any significant modifica- gence to good-quality dual solutions. We have pre- tionwithinCG-based large-scale aircraft schedule sented sample computational results to demonstrate Subramanian and Sherali: Effective Deflected Subgradient Optimization Scheme 578 INFORMS Journal on Computing 20(4), pp. 565–578, © 2008 INFORMS

that the proposed methodology provides a viable Beasley, J. 1990. OR-library: Distributing test problems by electronic alternative to a commercial LP solver within such air- mail. J. Oper. Res. Soc. 41(11) 1069–1072. line crew planning applications. Carvalho, J. M. V. 2005. Using extra dual cuts to accelerate column generation. INFORMS J. Comput. 17(2) 175–182. Future research includes the possibility of modify- Desrosiers, J., Y. Dumas, M. M. Solomon, F. Soumis. 1995. Time ing the proposed DSG scheme to derive a stand-alone constrained routing and scheduling. M. O. Ball, T. L. Magnanti, theoretically convergent method without relying on C. L. Monma, G. L. Nehmhauser, eds. Handbook in Operations a safeguarded phase II implementation of a known Research/Management Science, Network Routing, Vol. 8. Elsevier convergent procedure without compromising its Science, Amsterdam, 35–139. Elhallaoui, I., D. Villeneuve, F. Soumis, G. Desaulniers. 2005. observed effective empirical performance. Although Dynamic aggregation of set-partitioning constraints in column the focus of the presentwork has beenonlarge-scale generation. Oper. Res. 53(4) 632–645. crew planning problems, the DSG scheme is equally Gunluk, O., T. Kimbrel, L. Ladanyi, B. Schieber, G. B. Sorkin. 2004. applicable to solving Lagrangian duals of general lin- Vehicle routing and staffing for sedan service. http://www. ear programs, such as relaxations that arise from 0–1 optimization-online.org. Hjorring, C. 2004. Solving larger crew pairing problems. Proc. combinatorial problems in particular. It may also be TRISTAN V& The Fifth Triennial Sympos. Transportation Anal., Le beneficial to adopt alternative deflection schemes of Gosier, Guadeloupe. the type described in Lim and Sherali (2006) and to Hu, J., E. L. Johnson. 1999. Computational results with a primal- evaluate the relative performances of these methods dual subproblem simplex method—A new formulation and ona variety of optimizationproblems. Our prelimi- decompositionalgorithm. Oper. Res. Lett. 25 149–157. Kelley, J. E. 1960. The cutting-plane method for solving convex pro- narysteps inthis directionbased onresults from test grams. SIAM J. Appl. Math. 8 703–712. sets from Beasley (1990) have provento be promising. Klabjan, D., E. L. Johnson, G. L. Nemhauser, E. Gelman, S. Ramaswamy. 2001. Airline crew scheduling with regularity. Acknowledgments Transportation Sci. 35(4) 359–374. This research was supported in part by the National Science Klabjan, D., E. L. Johnson, G. L. Nemhauser, E. Gelman, Foundation under Grants 0094462 and 0245643. The authors S. Ramaswamy. 2002. Airline crew scheduling with time win- dows and plane-count constraints. Transportation Sci. 36(3) thank Raj Sivakumar, Krishnan Saranathan, and Steve Mor- 337–348. ley at Enterprise Optimization, United Airlines for their Kohl, N. 2003. Solving the world’s largest crew scheduling prob- support during this research work, and Pradeep Kumar at lem. ORbit Extra, Danish Oper. Res. Soc. 4(Special Issue, August) Skytech Solutions for his assistance in preparing the test 8–12. data. Lim, C., H. D. Sherali. 2006. Convergence and computational anal- yses for some variable target value and subgradient deflection methods. Comput. Optim. Appl. 34(3) 409–428. References Makri, A., D. Klabjan. 2004. A new pricing scheme for airline crew scheduling. INFORMS J. Comput. 16(1) 56–67. Amor, H. B., J. Desrosiers, J. M. V. Carvalho. 2006. Dual-optimal Merle, O. D., D. Villeneuve, J. Desrosiers, P. Hansen. 1999. Stabi- inequalities for stabilized column generation. Oper. Res. 54(3) lized column generation. Discrete Math. 237 229–237. 454–463. Nemhauser, G. L., L. A. Wolsey. 1999. Integer and Combinatorial Opti- Baker, B. M., J. Sheasby. 1999. Accelerating the convergence of sub- mization, 2nd ed. Wiley Interscience, New York. optimisation. Eur. J. Oper. Res. 117 134–144. Sherali, H. D., G. Choi. 1996. Recovery of primal solutions when Barahona, F., R. Anbil. 2000. The volume algorithm: Producing pri- using subgradient optimization methods to solve Lagrangian mal solutions with a subgradient method. Math. Programming duals of linear programs. Oper. Res. Lett. 19 105–113. 87(3) 385–399. Sherali, H. D., C. Lim. 2004. Onembeddingthe volume algo- Barahona, F., R. Anbil. 2002. On some difficult linear programs com- rithm ina variable target value method. Oper. Res. Lett. 32 ing from set partitioning. Discrete Appl. Math. 118 3–11. 455–462. Barnes, E., V. Chen, B. Gopalakrishnan, E. L. Johnson. 2002. A least- Sherali, H. D., G. Choi, C. H. Tuncbilek. 2000. A variable tar- squares primal-dual algorithm for solving linear programming get value method for nondifferentiable optimization. Oper. Res. problems. Oper. Res. Lett. 30 289–294. Lett. 26 1–8. Barnhart, C., E. L. Johnson, G. L. Nemhauser, M. W. P. Savelsbergh, Vance, P. H., A. Atamurk, C. Barnhart, E. Gelman, E. L. John- P. H. Vance. 1998. Branch-and-price: Column generation for son, A. Krishna, D. Mahidhara, G. L. Nemhauser, R. Rebello. solving huge integer programs. Oper. Res. 46(3) 316–329. 1997. A heuristic branch-and-price approach for the airline Bazaraa, M. S., H. D. Sherali, C. M. Shetty. 2006. Nonlinear crew pairing problem. Technical Report LEC-97-06, Depart- Programming& Theory and Algorithms, 3rd ed. JohnWiley & Sons, ment of Industrial and Systems Engineering, Georgia Institute New York. of Technology, Atlanta. Downloaded from informs.org by [128.173.125.76] on 21 February 2014, at 12:09 . For personal use only, all rights reserved.