Inside This Issue. . . Message from the Chair Message from the Chair • Message from the Newslet- Simge Küçükyavuz • ter Editor Industrial Engineering and Management Sciences 2019 ICS Awards Northwestern University, [email protected] • Message from the EIC of • IJOC Happy 2020 INFORMS Computing Society Members! The Test of Time Award for With the explosion of data and the ubiquity of hyper- • IJOC papers published in scaled compute capabilities, there has never been a more 2004-2008 exciting time to be at the interface of OR and computing. Research Highlight: Recent Our members have an unprecedented opportunity to • Developments in Pyomo make an impact on computing research and practice. Research Highlight: A Uni- • However, we must strive to make our computational fied Approach to Mixed- advances more widely accessible. To this end, we are Integer Optimization: Non- emphasizing the importance of making algorithms, code linear Formulations and and datasets available to our broader community. Scalable Algorithms Research Highlight: Re- • I am excited to start my term as the new ICS chair this year. I would like to search Highlight: Dynamic thank Cole Smith for his leadership and service to ICS over the past two years. Resource Allocation in the He has also been a great resource as I transition into my new role. Cole led Cloud with Near-Optimal the development of the new “Harvey J Greenberg Research Award” in honor Efficiency of our dear friend of ICS, Harvey Greenberg, who passed away in 2018. This Research Highlight: Re- • search Highlight: Robust new ICS award honors research excellence in the field of computation and OR Hypothesis Testing with applications, especially in emerging domains. Thanks to Dr. Bill Pierskalla’s Wasserstein Uncertainty generosity, and a number of university and individual sponsors, we have raised Sets over $30K, far exceeding our original funding goals. More details on this award and its sponsors are forthcoming. Please consider submitting your best “The Society is a great op- work at the interface of computing and emerging applications to this award. portunity to get involved, The 2021 ICS Conference will be in Tampa, Florida. Thanks to the local meet some new people organizing committee Changhyun Kwon, Hadi Charkhgard, and Tapas Das, for and generally combine an excellent proposal. Stay tuned for more details and please make attending your professional inter- this conference a high priority. ests with a little fun.” Enjoy the newsletter with highlights on our award winners from INFORMS 2019 and their exciting research!
Message from the Newsletter Editor
Yongjia Song, PhD Assistant Professor, Department of Industrial Engineering Clemson University, [email protected] It is the time to share the news for the society again and it is my pleasure to put things together. In this newsletter, please be aware of the updates of the society officers, board of directors, INFORMS Journal on Computing, as well as research highlights and insights from the 2019 ICS awards. Special thanks to all who contributed to this newsletter! I will also greatly appreciate any comment or suggestion that you may have for the newsletter. Officers 2019 ICS Prize Chair: Simge Küçükyavuz The ICS Prize is an annual award for the Northwestern University best English language paper or group of [email protected] related papers dealing with the Operations Research/Computer Science interface. The Vice-Chair/Chair Elect: 2019 prize was awarded to William E. Hart, Akshay Gupte Carl D. Laird, Jean-Paul Watson, David L. The University of Edinburgh Woodruff, Gabriel A. Hackebeil, Bethany L. Secretary/Treasurer: Nicholson & John Siirola for spearheading Mary Fenelon the creation and advancement of Pyomo, an open-source software MathWorks package for modeling and solving mathematical programs in Python. [email protected] Committee members: Doug Shier (Clemson University, Chair), Fatma Kilinc-Karzan (Carnegie Mellon University), Tom Sharkey (Rensselaer Board of Directors Polytechnic Institute) Kevin Furman ExxonMobil [email protected] The 2019 ICS Best Student Paper Award Illya Hicks Rice University The ICS Student Paper Award is an annual [email protected] award for the best paper at the interface of computing and operations research by a stu- dent author. The 2019 prize was awarded to Willem-Jan van Hoeve Ryan Cory-Wright and Jean Pauphilet, Mas- Carnegie Mellon University sachusetts Institute of Technology. Award- [email protected] winning paper: “A Unified Approach to Mixed- Integer Optimization: Nonlinear Formulations Bjarni Kristjansson and Scalable Algorithms”. Maximal Software [email protected] Honorable mentions: Sebastián Pérez-Salazar, Georgia Tech, “Dynamic Resource Allocation in the Cloud with Near-Optimal Efficiency”, and Liyan Xie, Georgia Tech, “Robust Hypothesis Testing Yongjia Song Clemson University Using Wasserstein Uncertainty Sets”. [email protected] Committee members: Frank Curtis (Lehigh University, Chair), Sergiy Juan Pablo Vielma Butenko (Texas A&M University), and Georgina Hall (INSEAD). Massachusetts Institute of Technology [email protected] The Harvey J. Greenberg Award for Service
Editors The Harvey J. Greenberg Award for Service is Journal on Computing: given to an individual in recognition of their im- Alice E. Smith pact on the ICS. The award is named in honor Auburn University of Harvey J. Greenberg, whose service to the [email protected] Society exemplifies the spirit of the award. It ICS Newsletter: is given at most every two years. The 2019 Yongjia Song award was given to Allen Holder of Rose- Clemson University Hulman Institute of Technology. [email protected] Committee members: Karen Aardal (TU-Delft, Chair), Ted Ralphs (Lehigh University), and Andrea Lodi (Polytechnique Montréal) Message from the Editor-in-Chief of INFORMS Journal on Computing
by ALICE SMITH1
Happy 2020 all INFORMS Computing Society members! processed and appear in a reasonable timeframe. We INFORMS Journal on Computing continues to expand are also now considering proposals for special issues. A both in scope and in quantity of papers published. We now special issue should be an emerging area or one that is have over 70 editors and advisory group members includ- underserved by our current journal but is worthy of at- ing a large proportion of new area and associate editors. tention by the OR/CS research community. We have also IJOC has ten technical areas including the recent Software instituted a few new awards including Meritorious Paper Tools area which is a unique place to publish your research (to recognize exceptional papers as they are accepted to which results in innovative software to share. the journal) and Meritorious Reviewer (to recognize the unsung heroes who go beyond the call of duty in review- In 2019 we instituted a policy which defines the situation ing). Our flagship new award is the Test of Time award of journal paper submissions with overlapping previously which is detailed in the next article of this newsletter. appearing conference papers. This will better inform both editors and reviewers and authors but still allow a great We are grateful for the support of the society as INFORMS deal of flexibility. JOC continues to advance and I invite your suggestions and comments to dynamically shape our journal to best Importantly, we (the editors and staff of the journal) are serve our research community. And, please continue to working very hard in getting both the time to review down send your impactful research papers to the journal and to and the time to publishing down so your papers can be serve as reviewers when asked.
The Test of Time Award for papers published in the INFORMS Journal on Computing in the years 2004-2008
We have started several new awards at the journal includ- selected the inaugural awardee, covering the period 2004 ing the IJOC Test of Time Paper Award. The specifics of through 2008. What follows is the citation from the award this award are below: committee and a reflection on the paper and this award by the authors. I want to thank the committee for their Number of awards : one per calendar year efforts and am very pleased to share this recognition of • Goal : recognition of a published IJOC paper which the impactful heritage of our journal. • has proven impactful over a length of time. Con- The Test of Time Award for papers published in the IN- siderations can be citations per year, downloads per FORMS Journal on Computing in the years 2004-2008 year, influence of sparking new areas of research, is awarded to: Zsolt Ugray, Leon Lasdon, John Plum- practical implications, significance of findings, etc. mer, Fred Glover, James Kelly, and Rafael Martí, “Scat- Criteria : all those papers published in the time win- ter search and local NLP solvers: A multistart frame- • dow are considered. A paper can only be recognized work for global optimization”, INFORMS Journal on with this award once. The time window is defined Computing 19(3):328-340, 2007. Not only has this pa- as a rolling window of five years starting 15 years per attracted a large number of citations, but it continues ago. to be highly cited to this day, many years following pub- Deadline: none. Papers are considered on an annual lication. In fact, a significant number of the publications • basis are recent, showing that the authors were ahead of their Selection : small committee appointed by the editor time. • in chief Recognition : certificate of Test of Time Award The paper describes the multi-start global optimization • (transmitted by email) and annual recognition in algorithm in the OptQuest solver, which can handle non- the journal (paper, authors, affiliations, citation). linear programs in both continuous and integer variables, Procedure : The set of papers published in IJOC requiring only that the functions be differentiable with • during the time window with their citations per respect to the continuous variables. The algorithm sug- year (since publishing) will be sent to the committee gests launch points for a local nonlinear solver, holding members for their deliberation. A winner is selected the integer variables constant, and is able to find global by the committee and the editor in chief is notified. optima for most test problems using very few local solver launches (the most expensive operation). I am happy to report that a remarkably able committee As well as influencing academic researchers (as shown by chaired by John Chinneck with members Bill Cook, Bruce the many citations), the paper has influenced the practice Golden, Pascal Van Hentenryck, and David Woodruff have 1Department of Industrial and Systems Engineering, Auburn University. [email protected] of global optimization. A large variety of applications have tion domain and of LSGRG in the nonlinear optimization been successfully attacked using OptQuest, and commer- domain could be integrated via a multistart framework to cial global optimization solvers such as LINDO have been provide a useful technology for additional important classes influenced by the ideas described in the paper. OptQuest of problems. has also developed into a leading tool for simulation opti- As of October 2019, there are 290 citations of this paper mization. listed in Scopus. The three most cited areas are Engineering, Comments on the 2019 IJOC Test of Time Paper Award Computer Science and Mathematics. They indicate the use- from the authors: When we started our work on nonlinear fulness of the approach for applications and as a starting mixed integer programming in 2004 we felt that there was point for further algorithm developments. Another mark an opportunity to expand on the designs of multistart algo- of the versatility of the framework is the existence of mul- rithms to solve these problems. Our flexible framework to tistart solvers in many modeling languages and platforms, combine heuristics as trial solution generators, local solvers including MATLAB, TOMLAB, AIMMS, LINGO, and Frontline supplied with the trial solutions as starting points, proce- Systems’ Enhanced Excel Solver. dures to deal with discrete variables, and the inclusion of We would like to thank the Editors of the INFORMS Journal starting point filters proved to be quite effective. The re- on Computing and the members of the 2019 IJOC Test of sulting system described in our 2007 paper incorporated Time Paper Award committee for selecting our paper for this the OptQuest implementation of the scatter search heuristic award. We also wish to remember our late colleague, John to generate the trial solutions and the LSGRG local solver Plummer, who contributed so much to our work. to provide high accuracy solutions. Our work showed that the applications of OptQuest in the simulation optimiza- Zsolt Ugray, Leon Lasdon, Fred Glover, James Kelly, and Rafael Martí, 10/14/2019.
Research Highlight: Recent Developments in Pyomo
Pyomo provides support for expressing and solving opti- models which can be stationary or mobile, and extract mization problems using Python. When the authors of the simulation data needed for optimization. The optimiza- Pyomo book [5] were awarded the ICS prize, we were tion can be used to design sensor networks to minimize asked to provide a project update for the newsletter. It impact metrics or maximize coverage given a set budget. was a great honor to be recognized by the society and we Furthermore, the software includes the ability to add side are happy to provide an update. While active development constraints to the optimization formulations that enforce continues on the core Pyomo modeling components and the number of sensors allowed within user defined sets. solver interfaces with an eye toward improving efficiency, This capability allows the user to find optimal sensor place- these activities occur “behind the scenes” with a strong ments that, for example, adhere to spacing requirements emphasis on preserving backwards compatibility and en- or use a specified mix of sensor technologies. For more suring a stable user experience. As such, the majority of information, go to http://chama.readthedocs.io the recent user-facing changes and developments have oc- curred through modeling extensions and related packages. This article will outline a few of the many user-facing capabilities that have been recently added to the Pyomo family or are in the development pipeline and will soon Coramin be available. Pyomo is used by industry and government as well as Coramin is a Pyomo-based Python package that provides academics for both teaching and research. The Pyomo tools for developing tailored algorithms for mixed-integer website is http://www.pyomo.org. nonlinear programming problems. This software includes classes for managing and refining convex relaxations of nonconvex constraints. These classes provide methods for Chama updating the relaxation based on new variable bounds, cre- ating and managing piecewise relaxations (for multi-tree Chama [6] is a Python package built on Pyomo that pro- based algorithms), and adding outer-approximation based vides mixed-integer, stochastic programming formulations cuts for convex or concave constraints. These classes to determine sensor locations and technologies that maxi- inherit from Pyomo Blocks, so they can be easily inte- mize the detection capabilities of sensor networks. Some grated with Pyomo models. Additionally, Coramin has of the methods in Chama were originally developed to de- functions for automatically generating convex relaxations sign sensor networks to detect contamination in water dis- of general Pyomo models. Coramin also has tools for do- tribution systems [2]. The redesigned software builds on main reduction, including a parallel implementation of this work to provide general purpose methods that can be optimization-based bounds tightening (OBBT) and vari- applied to many sensor placement optimization problems. ous OBBT filtering techniques. For more information, visit The methods in Chama include the ability to define sensor https://github.com/coramin/coramin. DAE dispatch, are modeled in and solved via Pyomo. Com- mitment and dispatch optimization models were recently Pyomo.DAE is a Pyomo extension for representing and migrated to EGRET. Prescient’s probabilistic scenario con- solving optimization problems that include systems of struction capability automatically produces scenarios with differential algebraic equations (DAEs) as constraints. It attached probabilities from historical forecasts for load, introduces new modeling components to the Pyomo en- solar, and/or wind power production and corresponding vironment for representing derivatives and continuous time-correlated actuals. Prescient provides both determin- domains and provides a high degree of modeling flexi- istic and stochastic commitment and dispatch capabilities. bility in the types of differential equations that can be Stochastic variants of commitment and dispatch models represented. Pyomo.DAE also includes model transforma- are very difficult to solve in practice; Prescient leverages tions which use simultaneous discretization approaches to Pyomo’s PySP [10] library for scalable solution of stochas- transform a DAE model into a finite-dimensional algebraic tic programming via scenario-based decomposition. Pre- model that can be solved with off-the-shelf solvers. More scient can leverage either commercially available MILP recently, a simulator interface was added to Pyomo.DAE solvers such as Gurobi and CPLEX, or open source alter- which allows several existing ODE and DAE integrators natives such as GLPK and CBC. Prescient development is to be applied to Pyomo.DAE models. Simulation results being funded in part by the US Department of Energy’s can be used to initialize a discretized Pyomo.DAE model North American Energy Resilience Model (NAERM) ini- before sending the model to an optimization solver creat- tiative, in addition to numerous R&D projects at Sandia ing a streamlined workflow where the same Pyomo.DAE National Laboratories. For additional information regard- model can be used for both simulation and optimization. ing Prescient capabilities, we refer to [1]. Pyomo.DAE has also been shown to be compatible with several other Pyomo packages including PySP and Py- omo.GDP allowing novel and complex optimization prob- GDP and GDPOpt lems to be rapidly prototyped and solved [9]. For more in- formation, visit https://pyomo.readthedocs.io/ Pyomo provides the Pyomo.GDP extension to support the en/stable/modeling_extensions/dae.html direct expression of structured disjunctive models through the Generalized Disjunction Programming paradigm. In- cluded with the GDP extension are standard reformula- EGRET tions for automatically generating several Mixed Integer (Nonlinear) Programming (MI(N)LP) reformulations of EGRET is a Python-based package for power grid opera- the original GDP model, including the standard direct tions optimization, based on Pyomo. EGRET is designed Big-M and Hull relaxations, as well as several iterative to quickly facilitate high-level analysis (e.g., as an engine relaxations that include combinations of basic steps and for expressing and solving well-known and/or state-of- cutting planes to generate compact, tight relaxations. the-art optimization models), while simultaneously pro- Beyond relaxations, we have also developed the GDPOpt viding flexibility for researchers to rapidly explore new solver, which directly optimizes GDP models using logic- optimization models in this domain. Key features include based decomposition approaches thereby avoiding the a full range of Pyomo models for the unit commitment reformulation of the entire model as a MI(N)LP prob- and economic dispatch optimization problems – which lem. Usage and implementation details for GDPopt can be reside at the core of power systems operations in prac- found in [4]. tice – in addition to related power flow problems (e.g., DCOPF and ACOPF). EGRET is organized to allow for a range of problem formulations, approximations, and relax- IDAES ations, allows for generic handling of data across distinct optimization models, and provides a declarative model The U.S. Department of Energy’s Institute for the Design representation to support the development of advanced of Advanced Energy Systems (IDAES) was formed in 2016 formulations. For more information, we refer to https: to develop new advanced process systems engineering //github.com/grid-parity-exchange/Egret. capabilities to support the design and optimization of in- novative new energy processes that go beyond current equipment/process constraints. In particular, the IDAES Prescient Computational Platform provides an extensible, equation- oriented process modeling library built on Pyomo, to ad- Prescient is a power systems operations modeling tool, dress challenges in formulating, manipulating, and solving with both production cost and resilience analysis capa- large, complex, structured process optimization problems. bilities. Prescient provides three primary capabilities: The IDAES Computational Platform leverages numerous (1) probabilistic operations scenario construction, (2) Pyomo capabilities and related packages, including unit commitment and dispatch optimization, and (3) block-oriented modeling for generating unit opera- rolling horizon simulation. Prescient is written entirely • tions models, in Python. All optimization models, including those for generalized disjunctive programming and the probabilistic scenario construction and commitment and • GDPOpt solver for expressing and solving conceptual design problems, Parmest includes the following capabilities: the Parmest parameter estimation package for pro- • 1. model based parameter estimation using experimen- cessing raw plant data, tal data, a trust region-based algorithm for optimizing “grey • 2. bootstrap resampling to calculate ↵-level confidence box” models, regions based on single or multi-variate distribu- the Pyomo.DAE package for expressing dynamic pro- • tions, and cess models, 3. likelihood ratio test to identify parameter values machine learning and surrogate modeling capabili- within a confidence region using the 2 distribution. • ties for managing models across multiple scales and fidelties and developing thermodynamic, physical The software also facilitates the use of standard data for- property, and kinetic submodels from experimental mats, graphics, and parallel processing. More information data, can be found at https://pyomo.readthedocs.io/ and Prescient for grid/plant interaction studies. en/stable/contributed_packages/parmest/ • index.html By combining these and other capabilities, IDAES builds on multiple advances in modeling, algorithms, and hard- ware to transform the paradigm of process modeling and PyNumero simulation to one of modeling and optimization. The IDAES framework is currently being used to increase the Numerical optimization has proven to be an efficient tool efficiency and operational flexibility of existing power for characterizing, designing, and operating chemical pro- plants, accelerating the identification and development cesses. Efficient optimization algorithms are typically com- of next-generation energy technologies. IDAES is also plex code bases written in low-level compiled program- being extended to enable modeling and optimization of ming languages. However, this complexity also impedes advanced water treatment and desalination systems. For development of new algorithms, requiring significant pro- more information visit https://idaes.org/. gramming and software engineering expertise and a steep learning curve. This is especially true when developing Kernel decomposition approaches to solve large-scale structured problems in parallel. The pyomo.kernel sub-package is an experimental model- To address and mitigate these challenges, we have been ing layer designed to provide a more pythonic and object- developing PyNumero, a Python package that provides a oriented approach to expressing Pyomo models. It in- high-level programming framework for rapid development cludes the basic set of modeling components necessary of optimization algorithms and analysis techniques for to express algebraic models, redesigned from the ground nonlinear problems. The package allows algorithm devel- up to be easier to use and extend. Features currently opment using high-level features of the Python program- available in pyomo.kernel include: ming language without making large sacrifices in compu- 1. an improved set of tools for modeling piecewise- tational performance. It combines the capabilities of the linear functions, including support for multivariate modeling language Pyomo with efficient libraries like the functions, AMPL Solver Library (ASL), the Harwell Subroutine Li- 2. reduced memory-usage for highly-structured, block- brary (HSL), the Message Passing Interface (mpi4py) and hierarchical models, NumPy/SciPy. This combination makes Pynumero an ex- 3. specialized constraint classes that can reduce model cellent tool to develop numerical optimization algorithms build time by several factors, that are interfaced with Pyomo. Furthermore, Pynumero 4. baked-in support for user-defined subclasses of mod- performs all linear algebra operations in compiled code, eling components and containers, and and is designed to avoid marshaling of data between the 5. direct support for conic constraint types recognized C and Python environments. by MOSEK [8]. More information about pyomo.kernel can be found mpi-sppy at https://pyomo.readthedocs.io/en/stable/ library_reference/kernel/index.html. PySP [10] allows users to model optimization problems where the input data is provided as a set of scenarios, Parmest each with an attached probability. The data are organized into a tree that corresponds to multiple decision stages. The Pyomo contributed package Parmest [7] facilitates The new software won’t be called PySP2 because it is a model-based parameter estimation along with characteri- major departure from PySP. The new software is much zation of uncertainty associated with the estimates such more parallelized and its user interface is much more as confidence regions around the parameter estimates. programmatic. Additionally, interfaces with PySP can make use of param- One major feature it shares with PySP, and most other eter vector scenarios, each with an attached probability modeling systems for optimization under uncertainty, is estimate. the ease with which one can start with a deterministic model and “add stochastics.” This is done by defining a David L. Woodruff function that instantiates a Pyomo model. The function UC Davis takes arguments that allow it to give the data for the ap- propriate scenario. One of the novel features of the new software is support for a virtual scenario tree to enhance scalability for very large numbers of scenarios. References An alpha release is planned for late Spring. [1] C. Barrows, A. Bloom, A. Ehlen, J. Ikaheimo, J. Jor- genson, D. Krishnamurthy, J. Lau, B. McBennett, SUSPECT M. O’Connell, E. Preston, A.S. Staid, Gord Stephen, and J.P. Watson. The IEEE reliability test system: A SUSPECT [3] is an open source toolkit (https:// proposed 2019 update. IEEE Transactions on Power github.com/cog-imperial/suspect) that symboli- Systems, 35(1):119–127, 2020. cally analyzes mixed-integer nonlinear optimization prob- [2] Jonathan Berry, William E Hart, Cynthia A Phillips, lems formulated using Pyomo. SUSPECT works on a James G Uber, and Jean-Paul Watson. Sensor place- directed acyclic graph representation of the entire op- ment in municipal water networks with temporal timization problem to perform bounds tightening, bound integer programming models. Journal of Water Re- propagation, monotonicity detection, and convexity de- sources Planning and Management, 132(4):218–224, tection. The tree-walking rules in SUSPECT balance the 2006. need for lightweight computation with effective special structure detection. SUSPECT can be used as a standalone [3] Francesco Ceccon, John D. Siirola, and Ruth Mis- tool or as a Python library to be integrated in other tools ener. SUSPECT: MINLP special structure detector or solvers. for Pyomo, 2019. [4] Qi Chen, Emma S. Johnson, John D. Siirola, and Ignacio E. Grossmann. Pyomo.gdp: Disjunctive mod- Contributors to the software de- els in python. In Mario R. Eden, Marianthi G. Ier- apetritou, and Gavin P. Towler, editors, 13th Inter- scribed national Symposium on Process Systems Engineering (PSE 2018), volume 44 of Computer Aided Chemical Michael Bynum, Anya Castillo, William E. Hart, Kather- Engineering, pages 889 – 894. Elsevier, 2018. ine A. Klise, Bernard Knueven, Carl D. Laird, Bethany L. [5] William E. Hart, Carl D. Laird, Jean-Paul Watson, Nicholson, John D. Siirola David L. Woodruff, Gabriel A. Hackebeil, Bethany L. Sandia National Laboratories Nicholson, and John D. Siirola. Pyomo–Optimization Modeling in Python, volume 67. Springer, second Qi Chen edition, 2017. Carnegie Mellon University [6] Katherine A. Klise, Bethany Nicholson, and Carl D. Gabriel A. Hackebeil Laird. Sensor placement optimization using Chama. Nokia Deepfield Number SAND2017-11472. Sandia National Labora- tories, Albuquerque, NM, 2017. Emma Johnson, Christopher Muir [7] Katherine A. Klise, Bethany L. Nicholson, Andrea Georgia Tech Staid, and David L. Woodruff. Parmest: Parameter estimation via Pyomo. In Salvador Garcia Muñoz, Anthony Burgard, John Eslick, Jaffer Ghouse, Andrew Lee, Carl D. Laird, and Matthew J. Realff, editors, Pro- David Miller ceedings of the 9th International Conference on Foun- National Energy Technology Laboratory dations of Computer-Aided Process Design, volume 47 of Computer Aided Chemical Engineering, pages 41 – Francesco Ceccon, Ruth Misener 46. Elsevier, 2019. Imperial College [8] APS Mosek. The mosek optimization software. On- line at http://www.mosek.com, 54(2-1):5, 2010. David Mildebrath Rice University [9] Bethany Nicholson and John Siirola. A framework for modeling and optimizing dynamic systems un- Jose S. Rodriguez der uncertainty. Computers & Chemical Engineering, Purdue University 114:81–88, 2018. [10] Jean-Paul Watson, D.L. Woodruff, and W.E. Hart. Jean-Paul Watson PySP: Modeling and solving stochastic programs in Lawrence Livermore National Laboratory Python. Mathematical Programming Computation, 4:109–149, 2012. Research Highlight: A Unified Approach to Mixed-Integer Optimization: Nonlinear Formulations and Scalable Algorithms
Ryan Cory-Wrighta, Jean Pauphileta
aOperations Research Center, Massachusetts Institute of Technology Cambridge, MA, USA
It is an honor to receive the ICS Student Paper Award. We In this work, we provide three main contributions. First, we would like to thank the committee, chaired by Frank Curtis and reformulate the logical constraint “xi = 0 if zi = 0” in a non- including Georgina Hall and Sergiy Butenko, for selecting our linear way, by substituting zi xi for xi in Problem (1). Second, paper. Additionally, we would like to thank our advisor Dim- we leverage the regularization term ⌦(x) to derive a tractable itris Bertsimas for his support. This summary is based on our reformulation of Problem (1). Finally, by invoking strong dual- joint work [6]. ity, we reformulate Problem (1) as a mixed-integer saddle-point problem, which is solvable via outer approximation. Observe that the structure of Problem (1) is quite general, as 1. Introduction the feasible set can capture known lower and upper bounds Z Many important problems from the operations research lit- on z, relations between di↵erent zi’s, or a cardinality constraint e> z k. Moreover, constraints of the form x , for some erature exhibit logical relationships between a continuous vari- 2X convex set , can be encoded within the domain of g, by defin- able x and a binary variable z of the form “x = 0 if z = 0”. X ing g(x) =+ if x < . As a result, (1) encompasses a large Among others, start-up costs in machine scheduling problems, 1 X financial transaction costs, cardinality constraints and fixed costs number of problems from the operations research literature, in- in facility location problems exhibit this relationship. Since cluding the network design problem described in Example 1. the work of Glover [19], this relationship is usually enforced through a big-M constraint of the form Mz x Mz for a Example 1. Network design is an important example of > su ciently large constant M 0. Glover’s work has been so in- problems of the form (1). Given a set of m nodes, the fluential that big-M constraints are now considered as intrinsic network design problem consists of constructing edges to components of the initial formulations, to the extent that text- minimize the construction plus flow transportation cost. books in the field introduce facility location, network design Let E denote the set of all potential edges and let n = E . | | or sparse portfolio problems with big-M constraints by default, Then, the network design problem is given by: although they are actually reformulations of logical constraints. In this work, we adopt a di↵erent perspective on the big- 1 min c> z + 2 x>Qx + d> x z ,x 0 M paradigm, viewing it as a regularization term, rather than a 2Z modeling trick. Under this lens, we show that regularization s.t. Ax = b, (2) drives the computational tractability of problems with logical xe = 0 if ze = 0, e E, constraints, explore alternatives to the big-M paradigm and pro- 8 2 m n m pose a numerically e cient algorithmic strategy which solves a where A R ⇥ is the flow conservation matrix, b R 2 n n 2 n broad class of problems with logical constraints. is the vector of external demands and Q R ⇥ , d R 2 2 define the quadratic and linear costs of flow circulation. 1.1. Problem Formulation and Main Contributions We assume that Q 0 is positive semi-definite. Moreover, ⌫ inequalities of the form ` z u can be incorporated We consider optimization problems which unfold over two within to account for existing/forbidden edges in the stages. In the first stage, a decision-maker activates binary vari- Z ables, while satisfying resource budget constraints and incur- network. Problem (2) is of the same form as (1) with ring activation costs. Subsequently, in the second stage, the 1 x>Qx + d> x, if Ax = b, x 0, decision-maker optimizes over the continuous variables. For- g(x) +⌦(x):= 2 + , otherwise. mally, we consider the problem 8 1 <> Observe that we can> generalize this model to account for min c> z + g(x) +⌦(x) s.t. xi = 0 if zi = 0, i [n], (1) : z ,x n 8 2 2Z 2R multiple commodities and edge capacities. where 0, 1 n, c Rn is a cost vector, g( ) is a generic Z✓{ } 2 · convex function, and ⌦( ) is a convex regularization function; 1.2. Structure · we formally state its structure in Assumption 1. We propose a unifying framework to address mixed-integer optimization problems (MIOs), and jointly discuss modeling choice and numerical algorithms.
Preprint submitted to Elsevier January 25, 2020 In Section 2, we identify a general class of MIOs, which yields a best choice of x given z. As we now illustrate, a number encompasses sparse regression, sparse portfolio selection, facil- of problems of practical interest exhibit this structure. ity location, and network design problems1. For this class, we discuss how imposing either big-M or ridge regularization ac- counts for non-linear relations between continuous and binary Example 2. For the network design example (2), we have variables in a tractable fashion. We also establish that regular- 1 f (z):= min x>Qx + d> x s.t. xe = 0 if ze = 0, e E. ization controls the convexity and smoothness of the objective. x 0:Ax=b 2 8 2 In Section 3, we propose a conjunction of general-purpose numerical algorithms to solve Problem (1). The backbone of 2.1.1. Sparse Empirical Risk Minimization our approach is an outer-approximation framework, enhanced n p Given a matrix of covariates X R ⇥ and a response vector with first-order methods to solve the Boolean relaxations and n 2 y R , the sparse empirical risk minimization problem seeks a 2 obtain improved lower bounds, and certifiably near-optimal warm- vector w which explains the response in a compelling manner, starts via randomized rounding. We also connect our approach to the perspective cut approach [14] from a theoretical and im- n 1 2 plementation standpoint. f (z):= min ` yi, w> xi + w 2, w p 2 k k 2R i=1 (5) In Section 4, we demonstrate that algorithms derived from X ⇣ ⌘ our framework can outperform state-of-the-art solvers. On net- s.t. w j = 0 if z j = 0, j [p], 8 2 work design problems with 100s of nodes, we improve the ob- p over = z 0, 1 : e> z k , where ` is an appropriate con- jective value of the returned solution by 5 to 40%, and our edge Z { 2{ } } increases as the problem size increases. On empirical risk mini- vex loss function; see Bertsimas et al. [8, Table 1] for examples. mization problems, our method with ridge regularization is able 2.1.2. Sparse Portfolio Selection to accurately select features among 100, 000s (resp. 10, 000s) Given an expected marginal return vector µ Rn, esti- of covariates for regression (resp. classification) problems, with n n 2 mated covariance matrix ⌃ R ⇥ , uncertainty budget parame- higher accuracy than both Lasso and non-convex penalties from 2 ter >0, cardinality budget parameter k 2, ..., n 1 , lin- the statistics literature. For sparse portfolio selection, we solve n m 2{ } ear constraint matrix A R ⇥ , and right-hand-side bounds to provable optimality problems one order of magnitude larger 2 l, u Rm, investors determine an optimal allocation of capital than previous attempts. 2 n between assets by minimizing over = z 0, 1 : e> z k Notation: We use nonbold face characters to denote scalars, Z 2{ } lowercase bold faced characters such as x to denote vectors, up- f (z) = min x>⌃x µ> x percase bold faced characters such as to denote matrices, and n X x + 2R 2 calligraphic characters such as to denote sets. We let e de- (6) X s.t. l Ax u, e> x = 1, note a vector of all 1’s, and 0 denote a vector of all 0’s. If x is a n-dimensional vector then Diag(x) denotes the n n diagonal xi = 0 if zi = 0, i [n]. ⇥ 8 2 matrix whose diagonal entries are given by x. Finally, we let n 2.1.3. Facility Location R+ denote the n-dimensional nonnegative orthant. Given a set of n facilities and m customers, the facility lo- cation problem consists of constructing facilities i = 1,...,n at 2. Framework and Examples cost ci in order to satisfy demand at minimal cost, i.e., In this section, we present the family of problems to which m n min min c> z + C X our analysis applies, discuss the role played by regularization, n n m ij ij z 0,1 X R+⇥ and provide some examples from the literature. 2{ } 2 Xj=1 Xi=1 m n s.t. X U , i [n], X = d , j [m], 2.1. Examples ij i 8 2 ij j 8 2 = = Problem (1) has a two-stage structure which comprises first Xj 1 Xi 1 X = 0 if z = 0, i [n], j [m], “turning on” some indicator variables z, and second solving a ij i 8 2 2 continuous optimization problem over the active components of (7) x. Precisely, (1) can be viewed as a discrete problem: where Xij corresponds to the quantity produced in facility i and shipped to customer j at a marginal cost of Cij, each facility i min c> z + f (z), (3) has a maximum output capacity of Ui and each customer j has z 2Z a demand of d . In the uncapacitated case where U = , the j i 1 where the inner minimization problem inner minimization problems decouple into independent knap- sack problems for each customer j. f (z):= min g(x) +⌦(x) s.t. xi = 0 if zi = 0, i [n], (4) x n 8 2 2R 2.2. A Regularization Assumption When we stated Problem (1), we assumed that its objective 1Our framework also applies to unit commitment and binary quadratic opti- function consists of a convex function g(x) plus a regularization mization problems among others (omitted for space; see [6]). term ⌦(x). We now formalize this assumption: Assumption 1. In (1), the regularization term ⌦(x) is one of: Assumption 2. For each subproblem generated by f (z), where • a big-M penalty, ⌦(x) = 0 if x M and otherwise, z , either the problem is infeasible, or strong duality holds. 1 k k1 1 2Z • a ridge penalty, ⌦(x) = x 2. 2 k k2 Note that all four problems stated in Section 2.1 satisfy As- sumption 2, as their inner problems are convex quadratics with This decomposition often constitutes a modeling choice in linear constraints [see 10, Section 5.2.3]. Under Assumption 2, 2 itself. We now illustrate this via the network design example . we now reformulate Problem (3) as a saddle-point problem:
Theorem 1. Under Assumption 2, Problem (3) is equivalent to Example 3. In the network design example (2), given the following problem: the flow conservation structure Ax = b, we have that = n x Me, where M i:bi>0 bi. In addition, if Q 0 ? min max c> z + h(↵) zi ⌦ (↵i), (8) then the objective function naturally contains a ridge reg- n z ↵ R ularization term with 1/ Pequal to the smallest eigenvalue 2Z 2 Xi=1
of Q. Moreover, it is possible to obtain a tighter natural where h(↵):= inf g(v) v>↵ is, up to a sign, the Fenchel v ridge regularization term by solving the following auxil- conjugate of g [see 10, Chap. 3.3], and iary semidefinite optimization problem apriori ⌦?( ):= M for the big-M penalty, | | max e> q s.t. Q Diag(q) 0, ? 2 q 0 ⌫ ⌦ = for the ridge penalty. ( ): 2
and using qi as the ridge regularizer for each index i [15]. Example 4. For the network design problem (2), we have
Big-M constraints are often considered to be a modeling 1 h(↵) = min x>Qx + (d ↵)> x, x 0:Ax=b 2 trick. However, our framework demonstrates that imposing ei- 1 2 ther big-M constraints or a ridge penalty is actually a regulariza- 1 2 = max b> p 2 Q A> p d + ↵ + 0 . 0 0,p 2 tion method. Interestingly, ridge regularization accounts for the relationship between the binary and continuous variables just as ⇣ ⌘ 1/2 Introducing ⇠ = Q A> p d + ↵ + , we have well as big-M regularization, without performing an algebraic 0 3 reformulation of the logical constraints . 1 2 1/2 h(↵) = max b> p ⇠ 2 s.t Q ⇠ A> p d + ↵. Conceptually, both regularization functions are equivalent ⇠,p 2 k k to a soft or hard constraint on the continuous variables x. How- ↵ Hence, (2) is equivalent to minimizing over z ever, they admit practical di erences: For big-M regulariza- 2Z tion, there usually exists a finite value M0, typically unknown a n priori, such that if M < M0, the regularized problem is infea- 1 2 ? c> z + f (z) = max c> z + b> p ⇠ 2 z j ⌦ (↵ j) sible. Alternatively, for every value of the ridge regularization ↵,⇠,p 2 k k Xj=1 parameter , if the original problem is feasible then the regu- 1/2 s.t Q ⇠ A> p d + ↵. larized problem is also feasible. Consequently, if there is no natural choice of M then imposing ridge regularization may be less restrictive than imposing big-M regularization. However, Theorem 1 reformulates f (z) in (3) as the problem > for any 0, the objective of the optimization problem with n ridge regularization is di↵erent from its unregularized limit as ? f (z) = max h(↵) zi ⌦ (↵i), (9) , while for big-M regularization, there usually exists a ↵ Rn !1 2 Xi=1 finite value M1 above which the two objective values match. for any feasible binary z . The regularization term ⌦ will 2Z 2.3. Duality to the Rescue be instrumental in our numerical strategy for it directly controls In this section, we derive Problem (4)’s dual and reformu- both convexity and smoothness f . late f (z) as a maximization problem. This reformulation is sig- Convexity: f (z) is convex in z as a point-wise maximum nificant for two reasons: First, it leverages a non-linear refor- of linear function of z. In addition, denoting ↵?(z) a solution of mulation of the logical constraints “xi = 0 if zi = 0” by in- (9), we have the following lower-approximation troducing additional variables vi such that vi = zi xi. Second, it proves that the regularization term ⌦(x) drives the convexity f (z˜) f (z) + f (z)>(z˜ z), z˜ , (10) r 8 2Z and smoothness of f (z), and thereby drives the computational where [ f (z)] := ⌦?(↵?(z) ) is a sub-gradient of f at z. tractability of the problem. To derive (4)’s dual, we require: r i i Smoothness: f (z) is smooth, in the sense of Lipschitz con- tinuity, which is a crucial property for deriving bounds on the 2More sophisticated regularization extraction schemes are possible [see 28]. 3Specifically, ridge regularization enforces logical constraints through per- Boolean relaxation gap, and designing local search heuristics in spective functions, as is made clear in Section 3.3. Section 3. The following proposition follows from Theorem 1: Proposition 1. For any feasible z and z0 in Conv( ), 3. An E cient Numerical Approach Z (a) With big-M regularization, n In this section, we present an e cient numerical approach ? f (z0) f (z) M (z z0) ↵ (z0) . to solve (8). The backbone is an outer-approximation strategy i i | i| i=1 to solve the problem exactly. We also use information from the (b) With ridge regularization, X problem’s Boolean relaxation to improve the duality gap. n ? 2 f (z0) f (z) (z z0)↵ (z0) . 3.1. Overall Outer-Approximation Scheme 2 i i i Xi=1 Theorem 1 reformulates the function f (z) as an inner max- Proposition 1 demonstrates that, as the coordinates of ↵?(z) imization problem, and demonstrates that f (z) is convex in z, are uniformly bounded with respect to z, f (z) is Lipschitz con- meaning a linear outer approximation provides a valid under- tinuous, with a constant proportional to M (resp. ) in the big- estimator of f (z), as outlined in Equation (10). Consequently, M (resp. ridge) case. a valid numerical strategy for minimizing f (z) comprises iter- atively minimizing a piecewise linear lower-approximation of 2.4. Examples - Continued f and refining this approximation at each step until some ap- proximation error " is reached, as described in Algorithm 1. We now derive the dual reformulation (8) for the examples This scheme was originally proposed for continuous variables from Section 2.1. by Kelley [22], and later extended to binary variables by Duran and Grossmann [12], who provide a proof of termination in a 2.4.1. Sparse Empirical Risk Minimization finite, yet exponential in the worst case, number of iterations. Let `? denote the Fenchel conjugate of the loss function ` with respect to its second argument (see [8, Table 1] for exam- Algorithm 1 Outer-approximation scheme ples). Then, we can rewrite Problem (5) as [see 8, for a proof]: Require: Initial solution z1 p ? t 1 f (z) = max h( ) z j⌦ ( j), Rp repeat 2 Xj=1 Compute zt+1,⌘t+1 solution of n ? where h( ):= max ` (yi,↵i). ↵ n = ↵ min c> z + ⌘ R : X> z ,⌘ 2 i=1 2Z X s s s s.t. ⌘ f (z ) + f (z )>(z z ), s [t]. 2.4.2. Sparse Portfolio Selection r 8 2 After either imposing an additional ridge regularization penalty Compute f (zt+1) and f (zt+1) 2 r 1 2 t t + 1 term / x 2 or decomposing ⌃ into a diagonal matrix D plus k k t+1 t+1 a low-rank matrix V> FV and using the term x> Dx as a ridge until f (z ) ⌘ " 2 t regularization with a di↵erent value of for each coordinate xi , return z we can rewrite Problem (5) as: To avoid solving a mixed-integer linear optimization prob- 1 2 2 f ( ) = max >↵ ↵ + > > + z w z n r y 2 2 l l u u 2 i i lem at each iteration, as suggested in the pseudo-code, this w R , ↵ R , k k , 2 m2, i l u2R+ 2R X strategy can be integrated within a single branch-and-bound s.t. w X>↵ + e + A>( l u) d. procedure using lazy callbacks. Lazy callbacks are now stan- dard tools in commercial solvers such as Gurobi and CPLEX where X := p⌃ denotes the square root of ⌃ and y, d are the and provide significant speed-ups for outer-approximation al- projections of µ onto the span and nullspace of X [see 4]. gorithms. With this implementation, the commercial solver constructs a single branch-and-bound tree and generates a new 2.4.3. Facility Location cut when at a feasible solution z. The facility location problem (7) admits a natural big-M 3.2. Improving the Lower-Bound: A Boolean Relaxation regularization by taking M = min(Ui, d j) for each Xij, and its dual formulation (8) involves: To certify optimality, high-quality lower bounds are of in- terest and can be obtained by relaxing the constraint z 0, 1 n 2{ } h(↵) = min ( ↵)> s.t. , > = , n n m C X Xe u X e d to z [0 1] . The Boolean relaxation of (3) becomes: X ⇥ 2R+ 2 = max d> p u> u s.t. ep> ue> C ↵. + , m n min c> z f (z) p R , u R+ z Conv( ) 2 2 2 Z which can be solved using Kelley’s algorithm [22], which is a continuous analog of Algorithm 1. Stabilization strategies have been empirically successful to accelerate the convergence of Kelley’s algorithm, as recently demonstrated on uncapacitated facility location problems by Fischetti et al. [13]. However, the feasibility of the randomly rounded solution z. Account- for Boolean relaxations, Kelleys’s algorithm computes f (z) and ing for z’s feasibility marginally reduces the above probabil- f (z) at dense vectors z, which is (sometimes substantially) ity, as shown for discrete optimization problems by [25]. r more expensive than for sparse binary vectors z’s, unless each • If is empty, by the probabilistic method, z? solves (8). R subproblem can be solved e ciently as in Fischetti et al. [13]. Under specific problem structure, other strategies might be Alternatively, the continuous minimization problem admits more e cient than Kelley’s method or the subgradient algo- a saddle-point reformulation rithm. For instance, if is a polyhedron, then, by strong du- Z ality, the inner minimization problem defining q(↵) is a linear n ? optimization problem that can be rewritten as a maximization min max c> z + h(↵) zi ⌦ (↵ j). (11) z Conv( ) ↵ m problem. Moreover, although we only consider linear relax- 2 Z 2R i=1 X ations here, tighter bounds could be attained by taking a higher- analogous to Problem (8). Under Assumption 2, we can further level relaxation from a hierarchy, such as the Lasserre [23] hi- write the min-max relaxation formulation (11) as a non-smooth erarchy [see 24, for a comparison]. The main benefit of such a maximization problem relaxation is that while the aforementioned Boolean relaxation only controls the first moment of the probability measure stud- n ? ied in Theorem 2, higher level relaxations control an increasing max q(↵), with q(↵):= h(↵) + min ci ⌦ (↵i) zi ↵ n z Conv( ) sequence of moments and thereby provide non-worsening prob- 2R 2 Z i=1 X ⇣ ⌘ abilistic guarantees for randomized rounding methods. How- and apply a projected sub-gradient ascent method as in Bertsi- ever, the additional tightness of these bounds comes at the ex- mas et al. [8]. We refer to [3, Chapter 7.5.] for a discussion on pense of solving relaxations with additional variables and con- implementation choices regarding step-size schedule and stop- straints; yielding a sequence of ever-larger semidefinite opti- ping criterion, and Renegar and Grimmer [26] for recent en- mization problems. Indeed, even the SDP relaxation which con- hancements using restarting. trols the first two moments of a randomized rounding method The benefits of solving the Boolean relaxation with these is usually intractable when n > 300, with current technology. algorithms are threefold. First, it provides a lower bound on the For an analysis of higher-level relaxations in sparse regression objective value of the discrete optimization problem (3). Sec- problems, we refer the reader to Atamturk and Gomez [2]. ond, it generates valid lower approximations of f (z) to initialize the cutting-plane algorithm. Finally, it supplies a sequence of 3.3. Relationship With Perspective Cuts continuous solutions that can be rounded and polished to ob- In this section, we connect the perspective cuts introduced tain good binary solutions. Indeed, the Lipschitz continuity of by Frangioni and Gentile [14] with our framework and discuss f (z) suggests that high-quality feasible binary solutions can be the merits of both approaches. To the best of our knowledge, a found in the neighborhood of a solution to the relaxation. We connection between Boolean relaxations of the two approaches formalize this observation in the following theorem: has only been made in the context of sparse regression, by Xie Theorem 2. Let z? denote a solution to the Boolean relaxation and Deng [27]. We first demonstrate that imposing the ridge ? ⌦ = 1 2 (11), denote the indices of z with fractional entries, and regularization term (x) 2 x 2 naturally leads to the per- R k k ↵?(z) denote a best choice of ↵ for a given z. Suppose that for spective formulation of Frangioni and Gentile [14]: ? ? any z , ↵ (z) L. Then, a random rounding z of z , 1 2 2Z | j| Theorem 3. Suppose that ⌦(x) = x and that Assumption i.e., z Bernouilli(z?), satisfies 0 f (z) f (z?) ✏ with 2 k k2 j ⇠ j 2 holds. Then, Problem (8) is equivalent to the problem: ✏2 probability at least p = 1 exp , where |R| 2 xi 2 2 2 ⇣ ⌘ n , if zi > 0, := 2M L for the big-M penalty, 1 zi |R| min min c> z + g(x) + 8 1 2 4 2 n := L for the ridge penalty. z x R 2 >0, if zi = 0 and xi = 0, 2 |R| 2Z 2 i=1 > X <> , otherwise. This result calls for multiple remarks: >1 > (12) • For "> p ln( ), we have that p > 0, which implies the > |R| : existence of a binary "-optimal solution in the neighborhood Theorem 3 follows from taking the dual of the inner maxi- of z?, in turn bounding the integrality gap by ". As a result, mization problem in Problem (9) (proof omitted). Note that the lower values of M or typically make the discrete optimiza- equivalence stated in Theorem 3 also holds for z Conv( ). 2 Z tion problem easier. As previously observed in Akturk¨ et al. [1], Problem (12) can • A solution to the Boolean relaxation often includes some bi- be formulated as a second-order cone problem (SOCP) nary coordinates, i.e., < n. In this situation, it is tempting ? |R| n to fix zi = z for i < and solve the master problem (3) i R min c> z + g(x) + ✓i over coordinates in . In general, this provides sub-optimal x Rn,z ,✓ Rn R 2 2Z 2 i=1 solutions. However, Theorem 2 quantifies the price of fixing X (13) 2 variables and bounds the optimality gap by p ln( ). xi |R| s.t. ✓i + zi, i [n]. • In the above high-probability bound, we do not account for 0✓qi zi1 8 2 B C 2 B C B C @ A and solved by linearizing the SOCP constraints into so-called the form z z, where z is a binary vector which encodes ex- 0 0 perspective cuts, i.e., ✓ 1 x¯ (2x x¯ z ), x¯ ¯, which have isting edges. We generate graphs which contain a spanning tree i 2 i i i i 8 2 X been extensively studied in the literature in the past fifteen years plus pm additional randomly picked edges, with p [4], so that 2 [14, 20]. Observe that by separating Problem (12) into mas- the initial network is sparse5 and connected. We also impose a ter and subproblems, an outer approximation algorithm would cardinality constraint e> z (1+5%)z>e, which ensures that the 0 yield the same cut (10) as in our scheme. However, while the- network size increases by no more than 5%. For each edge, we oretically similar, there are subtle di↵erences which make the impose a capacity u (0.2, 1)B/A , where B = m b j is e ⇠ bU e j=1 j computational performance of our proposal more attractive: the total demand and A = (1 + p)m. We penalize the constraint • Our approach is, strictly speaking, a generalized Benders de- x u via a penalty parameter = 1, 000. P composition scheme [see 18] which only includes cuts corre- We apply our approach to large networks with 100s nodes, sponding to optimal choices of x for a given incumbent solu- i.e., 10, 000s edges, which is ten times larger than the state- tion z. Alternatively, the perspective cut approach is an outer of-the-art [21, 20], and compare the quality of the incumbent approximation scheme in the sense of [12] which generates solutions after an hour. In 100 instances, our cutting plane al- the optimal cut with respect to (z, x). gorithm with big-M regularization provides a better solution • Our outer-approximation scheme can easily be implemented 94% of the time, by 9.9% on average, and by up to 40% for within a standard integer optimization solver such as CPLEX the largest networks. For ridge regularization, the cutting plane or Gurobi using callbacks. Unfortunately, a full outer ap- algorithm scales to higher dimensions than plain mixed-integer proximation approach for the perspective formulation requires SOCP, returns solutions systematically better than those found a tailored branch-and-bound procedure [see 14, Section 3.1 by CPLEX (in terms of unregularized cost), by 11% on average, for details]. In this regard, our approach is more practical. and usually outperforms big-M regularization, as reported in Table 1. Even artificially added, ridge regularization improves the tractability of outer approximation. 4. Numerical Experiments In this section, we numerically evaluate our cutting-plane 4.1.2. Sparse Empirical Risk Minimization algorithm, implemented in Julia 1.0 using CPLEX 12.8.0 and For sparse empirical risk minimization, our method with the Julia package JuMP.jl version 0.18.4 [11]. We compare ridge regularization scales to regression problems with up p = our method against solving the natural big-M or MISOCP for- 100, 000s features and classification problems with p = 10, 000s mulations directly, using CPLEX 12.8.0. All experiments were of features [8]. This constitutes a three-order-of-magnitude im- performed on one Intel Xeon E5 2690 v4 2.6GHz CPU core provement over previous attempts using big-M regularization and using 32 GB RAM. The big-M and MISOCP formula- [5]. We also select features more accurately, as shown in Figure tions are allocated 8 threads, unless explicitly stated otherwise. 1, which compares the accuracy of the features selected by the However, JuMP.jl version 0.18.4 does not allow our outer- outer-approximation algorithm (in green) with those obtained approximation scheme to benefit from multi-threading. from the Boolean relaxation (in blue) and other methods.
Accuracy Accuracy 4.1. Overall Empirical Performance Versus State-of-the-Art 100 100
In this section, we compare our approach to state-of-the-art 75 75 methods, and demonstrate that our approach outperforms the %) %) 50 50 (in (in state-of-the-art for several relevant problems. A A
25 25 4.1.1. Network Design
500 1000 1500 2000 2500 3000 3500 4000 4500 500 1500 2500 3500 4500 5500 6500 7500 8500 We begin by evaluating the performance of our approach Sample size n Sample size n for the multi-commodity network design problem (formulation = , = , omitted for space). We adapt the methodology of Gunl¨ uk¨ and (a) Regression, p 20 000 (b) Classification, p 10 000 Linderoth [20] and generate instances where each node i [m] 2 Figure 1: Accuracy (A) of the feature selection method as the number of sam- is the unique source of one commodity (k = m). For each com- ples n increases, for the outer-approximation algorithm (in green), the solution modity j [m], we generate demands via: found by the subgradient algorithm (in blue), ElasticNet (in red), MCP (in or- 2 ange), SCAD (in pink) [see 8, for definitions]. Results are averaged over 10 j j j instances of synthetic data with (SNR, p, k) = (6, 20000, 100) for regression b = (5, 25) , j0 , j and b = b , , , j0 bU e 8 j j0 (left) and (5 10000 100) for classification (right). Xj0, j where x is the closest integer to x and (a, b) is uniform on b e U 4.1.3. Sparse Portfolio Selection [a, b]. We generate edge construction costs, ce, uniformly on (1, 4), and marginal flow circulation costs proportionally to We applied our approach to sparse portfolio selection prob- U each edge length4. The discrete set contains constraints of lems in Bertsimas and Cory-Wright [4] and, by introducing a Z
4Nodes are uniformly distributed over the unit square [0, 1]2. We fix the 5The number of initial edges is O(m). cost to be ten times the Euclidean distance. Table 1: Best solution found after one hour on network design instances with m nodes and (1 + p)m initial edges. We report improvement, i.e., the relative di↵erence between the solutions returned by CPLEX and the cutting-plane. Values are averaged over five randomly generated instances. For ridge regularization, we report the “unregularized” objective value, that is we fix z to the best solution found and resolve the corresponding sub-problem with big-M regularization. A “ ” indicates that the solver could not finish the root node inspection within the time limit (one hour).
Big-M Ridge Overall mpunit CPLEX Cuts Improvement CPLEX Cuts Improvement Improvement 40 0 109 1.17 1.16 0.86% 1.55 1.16 24.38% 1.74% ⇥ 80 0 109 8.13 7.52 6.99% 9.95 7.19 26.74% 10.85% ⇥ 120 0 1010 3.03 2.10 29.94% 1.94 % 35.30% ⇥ 160 0 1010 5.90 4.32 26.69% 4.07 % 30.91% ⇥ 200 0 1010 11.45 7.78 31.45% 7.50 % 32.32% ⇥ 40 1 108 5.53 5.47 1.07% 5.97 5.45 8.74% 1.41% ⇥ 80 1 109 2.99 2.94 1.81% 3.16 2.95 6.78% 1.89% ⇥ 120 1 109 8.38 7.82 6.69% 7.82 % 6.86% ⇥ 160 1 1010 1.64 1.54 5.98% 1.54 % 6.03% ⇥ 200 1 1010 2.60 2.54 2.33% 2.26 % 12.98% ⇥ 40 2 108 4.45 4.38 1.62% 4.76 4.36 8.27% 2.06% ⇥ 80 2 109 2.44 2.31 5.39% 2.46 2.31 5.97% 5.40% ⇥ 120 2 109 6.23 5.89 5.55% 5.89 % 5.75% ⇥ 160 2 1011 1.22 1.16 4.74% 0.71 % 19.33% ⇥ 200 2 1010 2.06 1.43 30.46% 1.01 % 73.43% ⇥ 40 3 108 3.91 3.85 1.58% 4.13 3.85 6.73% 1.78% ⇥ 80 3 109 2.06 1.94 5.76% 2.04 1.94 5.44% 5.85% ⇥ 120 3 109 5.43 5.15 5.31% 4.2 % 12.35% ⇥ 40 4 108 3.32 3.28 1.35% 3.53 3.26 7.71% 1.85% ⇥ 80 4 109 1.88 1.77 5.59% 1.77 % 5.64% ⇥ ridge regularization term, successfully solved instances to opti- mization problem with logical constraints, and big-M regu- mality at a scale of one order of magnitude larger than previous larization often performs better in this setting. attempts as summarized in Table 2. • The e ciency of outer-approximation relies on the speed at which separation problems are solved. In this regard, special 4.2. Big-M Versus Ridge Regularization problem-structure or cardinality constraints on the discrete In this section, we compare big-M and ridge regularization variable z help. This has been the case in the network design, for the sparse portfolio selection problem (6). Figure 2 depicts sparse ERM and portfolio selection problems studied here. the relationship between the optimal allocation of funds x? and In summary, the benefits of applying big-M or ridge regulariza- the regularization parameter M (left) and (right). Qualita- tion are largely even and depend on the instance to be solved. tively, both parameters impact the investment profiles in a com-
0.30 INCY INCY parable manner and eventually select the same stocks. Yet, we INFO 0.30 INFO NFLX NFLX RES RES 1 TPX TPX 0.25 observe two main di↵erences: First, setting M < renders the ULTI 0.25 ULTI k WDAY WDAY Support change Support change entire problem infeasible, while the problem remains feasible 0.20 0.20 Allocation > Allocation for any 0. This is a serious concern when a lower bound on 0.15 0.15 M is not known apriori. Second, the profile for ridge regular- 0.10 0.10 ization seems smoother than its equivalent with big-M. 10- 1.0 10- 0.9 10- 0.8 10- 0.7 10- 0.6 10- 0.5 10- 0.4 10- 1 100 101 102 103 More generally, our numerical experiments [see 6, Section 4] suggest that big-M and ridge regularization play fundamen- (a) Big-M regularization (b) Ridge regularization tally the same role in reformulating logical constraints. There Figure 2: Optimal allocation of funds between securities as the regularization are, however, some key di↵erences: parameter (M or ) increases. Data is obtained from the Russell 1000, with a • cardinality budget of 5, a rank 200 approximation of the covariance matrix, a Ridge regularization is more amenable to outer-approximation, because it makes the objective function strongly convex and one-month holding period and an Arrow-Pratt coe cient of 1, as in Bertsimas and Cory-Wright [4]. Setting M < 1 renders the entire problem infeasible. breaks degeneracy issues which usually hinder convergence. k • Ridge regularization should be the method of choice when the objective function contains a naturally occurring strongly convex term, which is su ciently large (this occurs for sparse 5. Conclusion ERM and portfolio selection problems). • Big-M regularization does not increase the complexity of lin- In this work, we proposed a new interpretation of the big-M ear separation problems, while ridge regularization does. As method, as a regularization term rather than a modeling trick. a result, ridge regularization yields more expensive separa- By expanding this regularization interpretation to include ridge tion problems when the underlying problem is a linear opti- Table 2: Largest sparse portfolio instances reliably solved by each approach
Reference Solution method Largest instance size solved (no. securities) Frangioni and Gentile [16] Perspective cut+SDP 400 Bonami and Lejeune [9] Nonlinear B&B 200 Gao and Li [17] Lagrangian relaxation B&B 300 Zheng et al. [28] Branch-and-Cut+SDP 400 Bertsimas and Cory-Wright [4] Algorithm 1 with ridge regularization 3, 200
regularization, we considered a wide family of relevant prob- [17] J. Gao and D. Li. Optimal cardinality constrained portfolio selection. lems from the operations research literature and derived equiv- Operations Research, 61(3):745–761, 2013. [18] A. M. Geo↵rion. Generalized benders decomposition. Journal of opti- alent reformulations as mixed-integer saddle-point problems, mization theory and applications, 10(4):237–260, 1972. which naturally give rise to theoretical analysis and compu- [19] F. Glover. Improved linear integer programming formulations of nonlin- tational algorithms. Our framework provides provably near- ear integer problems. Management Science, 22(4):455–460, 1975. optimal solutions in polynomial time via solving Boolean re- [20] O. Gunl¨ uk¨ and J. Linderoth. Perspective reformulations of mixed inte- ger nonlinear programs with indicator variables. Mathematical Program- laxations and performing randomized rounding as well as certi- ming, 124(1-2):183–205, 2010. fiably optimal solutions through an e cient branch-and-bound [21] K. Holmberg and J. Hellstrand. Solving the uncapacitated network design procedure, and indeed frequently outperforms the state-of-the- problem by a lagrangean heuristic and branch-and-bound. Operations art in numerical experiments. Research, 46(2):247–259, 1998. [22] J. E. Kelley, Jr. The cutting-plane method for solving convex programs. We believe our framework, which decomposes the problem Journal of the society for Industrial and Applied Mathematics, 8(4):703– into a discrete master problem and continuous subproblems, 712, 1960. could be extended more generally to mixed-integer conic op- [23] J. B. Lasserre. An explicit exact sdp relaxation for nonlinear 0-1 pro- timization, such as mixed-integer semidefinite optimization, as grams. In International Conference on Integer Programming and Combi- natorial Optimization, pages 293–303. Springer, 2001. developed in Bertsimas et al. [7]. [24] M. Laurent. A comparison of the sherali-adams, lovasz-schrijver,´ and [1] M. S. Akturk,¨ A. Atamturk,¨ and S. Gurel.¨ A strong conic quadratic refor- lasserre relaxations for 0–1 programming. Mathematics of Operations mulation for machine-job assignment with controllable processing times. Research, 28(3):470–496, 2003. Operations Research Letters, 37(3):187–191, 2009. [25] P. Raghavan and C. D. Tompson. Randomized rounding: a technique for [2] A. Atamturk and A. Gomez. Rank-one convexification for sparse regres- provably good algorithms and algorithmic proofs. Combinatorica, 7(4): sion. arXiv:1901.10334, 2019. 365–374, 1987. [3] D. P. Bertsekas. Nonlinear programming: 3rd Edition. Athena Scientific [26] J. Renegar and B. Grimmer. A simple nearly-optimal restart scheme for Belmont, 2016. speeding-up first order methods. arXiv:1803.00151, 2018. [4] D. Bertsimas and R. Cory-Wright. A scalable algorithm for sparse port- [27] W. Xie and X. Deng. The ccp selector: Scalable algorithms folio selection. arXiv:1811.00138, 2018. for sparse ridge regression from chance-constrained programming. [5] D. Bertsimas, A. King, and R. Mazumder. Best subset selection via a arXiv:1806.03756, 2018. modern optimization lens. The Annals of Statistics, 44(2):813–852, 2016. [28] X. Zheng, X. Sun, and D. Li. Improving the performance of miqp [6] D. Bertsimas, R. Cory-Wright, and J. Pauphilet. A unified approach to solvers for quadratic programs with cardinality and minimum threshold mixed-integer optimization: Nonlinear formulations and scalable algo- constraints: A semidefinite program approach. INFORMS Journal on rithms. arXiv preprint arXiv:1907.02109, 2019. Computing, 26(4):690–703, 2014. [7] D. Bertsimas, J. Lamperski, and J. Pauphilet. Certifiably optimal sparse inverse covariance estimation. Mathematical Programming, 2019. [8] D. Bertsimas, J. Pauphilet, and B. Van Parys. Sparse regression: Scalable algorithms and empirical performance. Statistical Science, 2019. [9] P. Bonami and M. A. Lejeune. An exact solution approach for portfolio optimization problems under stochastic and integer constraints. Opera- tions Research, 57(3):650–670, 2009. [10] S. Boyd and L. Vandenberghe. Convex optimization. Cambridge univer- sity press, 2004. [11] I. Dunning, J. Huchette, and M. Lubin. Jump: A modeling language for mathematical optimization. SIAM Review, 59(2):295–320, 2017. [12] M. A. Duran and I. E. Grossmann. An outer-approximation algorithm for a class of mixed-integer nonlinear programs. Mathematical Program- ming, 36(3):307–339, 1986. [13] M. Fischetti, I. Ljubic,´ and M. Sinnl. Redesigning benders decomposition for large-scale facility location. Management Science, 63(7):2146–2162, 2016. [14] A. Frangioni and C. Gentile. Perspective cuts for a class of convex 0–1 mixed integer programs. Mathematical Programming, 106(2):225–236, 2006. [15] A. Frangioni and C. Gentile. Sdp diagonalizations and perspective cuts for a class of nonseparable miqp. Operations Research Letters, 35(2): 181–185, 2007. [16] A. Frangioni and C. Gentile. A computational comparison of reformula- tions of the perspective relaxation: Socp vs. cutting planes. Operations Research Letters, 37(3):206–210, 2009. Research Highlight: Dynamic Resource Allocation in the Cloud with Near-Optimal E ciency
Sebastian Perez-Salazar Ishai Menache† Mohit Singh‡ Alejandro Toriello§
We would like to thank the ICS Student Paper Award to user dissatisfaction. committee, which was chaired by Anna Nagurney and Recent scheduling proposals address these challenges included Sergiy Butenko and Frank E. Curtis, for through work-maximizing yet fair schedulers [3, 4]. selecting our work as a Runner-Up. However, such schedulers do not have explicit SLA guarantees. On the other hand, other works focus on enforcing SLAs [5–7], but do not explicitly optimize 1 Introduction the use of extra resources. Our goal in this work is to understand the fundamen- Cloud computing has motivated renewed interest in tal tradeo between high utilization of resources and resource allocation, manifested in new consumption SLA satisfaction of individual users. In particular, models (e.g., AWS spot pricing), as well as the design we design algorithms that guarantee both near opti- of resource-sharing platforms [1, 2]. These platforms mal utilization as well as the satisfaction of individual need to support a heterogenous set of users, that share SLAs, simultaneously. To that end, we formulate a the same physical computing resource, e.g., CPU, basic model for online dynamic resource allocation. memory, I/O bandwidth. Providers such as Amazon, We focus on a single divisible resource, such as CPU Microsoft and Google o er cloud services with the or I/O bandwidth, that has to be shared among mul- goal of benefiting from economies of scale. However, tiple users. Each user also has an SLA that specifies the ine cient use of resources – over-provisioning the fraction of the resource it expects to obtain. The on the one hand or congestion on the other – could actual demand of the user is in general time-varying, result in a low return on investment or in loss of and may exceed the fraction specified in the SLA. customer goodwill, respectively. Hence, resource As in many real systems, the demand is not known allocation algorithms are key for e ciently utilizing in advance, but rather arrives in an online manner. cloud resources. Arriving demand is either processed or queued up, To ensure quality of service, cloud o erings often depending on the resource availability. In many real- come with a service level agreement (SLA) between world scenarios, it is di cult to measure the actual the provider and the users. An SLA specifies the demand size (see, e.g., [8]). Accordingly, we as- amount of resource the user is entitled to consume. sume that the system (and the underlying algorithm) Perhaps the most common example is renting a vir- receives only a simple binary feedback per user at tual machine (VM) that guarantees an explicit amount any given time: whether the user queue is empty (the of CPU, memory, etc. Naturally, VMs that guarantee user’s work arriving so far has been completed), or more resources are more expensive. In this context, not. This is a plausible assumption in many systems, a simple allocation policy is to assign each user the because one can observe workload activity, yet antic- resources specified by their SLAs. However, such an ipating how much of the resource a job will require allocation can be wasteful, as users may not need the is more di cult. Additionally, it also models settings resource at all times. In principle, a dynamic alloca- where demands are not known in advance. tion of resources can increase the total e ciency of While online dynamic resource allocation problems the system. However, allocating resources dynami- have been studied in di erent contexts and communi- cally without carefully accounting for SLAs can lead ties, our work aims to address the novel aspects aris-