UNIVERSITE´ PARIS 13 LABORATOIRE D’INFORMATIQUE DE PARIS NORD

HABILITATION A` DIRIGER DES RECHERCHES

specialit´ e´ SCIENCES (Informatique)

par

Claudia D’Ambrosio

Solving well-structured MINLP problems

Jury:

Emilio Carrizosa Universidad de Sevilla Rapporteur Sourour Elloumi Ecole´ Nationale Superieure´ de Examinateur Techniques Avancees´ Ignacio Grossmann Carnegie Mellon University Rapporteur Jean-Bernard Lasserre CNRS-LAAS Examinateur Ruth Misener Imperial College London Rapporteur Fred´ eric´ Roupin Universite´ Paris 13 Parrain Roberto Wolfler Calvo Universite´ Paris 13 Parrain

Contents

Acknowledgments iii

Keywordsv

Preface vii

I Curriculum Vitae1

II Research Summary 23

1 Introduction to MINLP Problems and Methods 25 1.1 Modeling problems by Mathematical Programming...... 26 1.1.1 Reformulations...... 26 1.1.2 Relaxations...... 27 1.1.3 Approximations...... 27 1.1.4 Solution methods...... 28 1.2 Mixed Integer Technology...... 29 1.3 Non Linear Programming Technology...... 30 1.4 Mixed Integer Non Linear Programming Technology...... 30

2 Exact Methods for MINLP problems 35 2.1 Methods based on strengthening of classic relaxations...... 35 2.1.1 Euclidean Steiner Tree...... 36 2.1.2 The pooling problem and its variant with binary variables..... 38 2.2 On Separable Functions...... 41 2.2.1 SC-MINLP...... 42 2.2.2 Polyopt...... 45

3 Heuristic Methods 47 3.1 General methods...... 47 47subsection.3.1.1

i ii CONTENTS

3.1.2 Optimistic approximation...... 49 3.2 Tailored methods...... 51 3.2.1 Non linear Knapsack Problem...... 51

4 Other contributions 53 4.1 Feasibility issues on Hydro Unit Commitment...... 53 4.2 MILP models for the selection of a small set of well-distributed points... 54 4.3 Well placement in reservoir engineering...... 54 4.4 A heuristic for multiobjective convex MINLPs...... 55 4.5 Bilevel programming and smart grids...... 55 4.6 Feasibility Pump for aircraft deconfliction problems...... 56 4.7 On the Product Knapsack Problem...... 56 4.8 Learning to Solve Hydro Unit Commitment Problems in France...... 57

5 Conclusions and Future Research 59

Bibliography 61

A Selected Publications 67 Acknowledgments

The road that brought me here was long. The more I think about when my passion to Optimization started, the more I’m convinced that it dates back to when I was a third- year bachelor student at University of Bologna. One of the optional classes I choose and attended was Practical Tools for Optimization taught by Professor Andrea Lodi: besides a natural inclination for the topic, the enthusiasm of Andrea was so captivating that for the Master I choose all the classes on Optimization that were offered and a Master Thesis on mixed integer non linear optimization, a topic I got immediately addicted to. In 2006 I started my Ph.D. at University of Bologna where the O.R. group welcomed me warmly from the very beginning. I deeply thank all the members of this wonderful group, with a special thought to Andrea who, as Ph.D. advisor, was and is an inspiration to me. Back in 2007 I had the chance to spend my summer working as intern at the IBM T.J. Watson Research Center: Jon Lee and Andreas Wachter¨ were great manager and tutor, respectively, and all the colleagues at IBM helped to make it a great experience. Then in 2010 I got a post-doc position at University of Wisconsin-Madison and I had the chance to be supervised by Jeff Linderoth and Jim Luedtke. I learned a lot in 6 months and the festive environment made me enjoy very much my Mid-West experience. Thanks also to all the students of the group who were lovely officemates. In October 2011 I moved to France to join the CNRS family and, in particular, I was affiliated to the SYSMO team at LIX, Ecole´ Polytechnique. Among the SYSMO mem- bers, I would like to thank, in particular, Leo Liberti who’s a precious colleague and friend, Evelyne Rayssac who helps me a lot since the very first moment to survive to the crazy bu- reaucracy, the former and current Ph.D. student Claire Lizon, Youcef Sahraoui, Luca Men- carelli, Vy Khac Ky, Gustavo Dias, Oliver Wang, Kostas Tavlaridis-Gyparakis, Gabriele Iommazzo, and the former and current post-docs Sonia Toubaline, Pierre-Louis Poirion, and Dimitri Thomopulos. Moreover, I would like to thank Fabio Roda and Sylvie Tonda who worked and works hard to convince industries about the importance of Optimization. Earlier this year the SYSMO team gave way to the DASCIM team and I thank Michalis Vazirgiannis who took the lead of such challenging project. I hope we have several years of fruitful collaborations ahead. I started discussing about this HDR with Roberto Wolfler-Calvo and Frederic Roupin several years ago, maybe 5 years ago. They immediately accepted to be my tutors and

iii iv ACKNOWLEDGMENTS were kind enough to wait 5 years to make this happen. Thanks a lot for your help and support, it was really precious to me the fact that you did not put pressure on me to finish this when I needed to focus on something else but you did when I needed to finalize the manuscript. I warmly thank also the other members of the HDR, namely the “rapporteurs” Emilio Carrizosa, Ignacio Grossmann, and Ruth Misener, and the “examinateurs” Sourour Elloumi and Jean Bernard Lasserre. It is really an honor to have them in my HDR jury. I made a huge step forward with the writing of this HDR at the end of 2015 when I spent 3 months in Montreal.´ Andrea Lodi invited me to visit him and gave me the opportunity to work in the dynamic environment of GERAD and CIRRELT. I would like to thank him together with all the GERAD and CIRRELT members, in particular Miguel Anjos and Zhao Sun, fantastic collaborators, and Marta Pascoal, with who I shared the office, lunches, and lovely times. I also would like to thank Sonia Vanier & the equipe´ SAMM of Universite´ Paris 1 who was so kind to host me when I had to “hide” to focus on finalizing this manuscript: the nice desk with a view on Paris, the enjoyable coffee break with plenty of chocolate and laughs, and your hospitality helped me a lot to achieve the end of this manuscript writing. During the last ten years I had the chance to meet and collaborate with several col- leagues, each of which taught me a lot and make me enjoy my work. I thank all the co-authors of the works presented in this manuscript and of all my on-going projects: it is also thanks to you that I am still passionate about research! Last but not least, I need to devote a special thought to my parents, my sister, my friends, and, in particular, to Alessandro who supported me constantly and with a unique patience during the last 9 years. I’m really lucky and grateful for having such a generous and understanding husband.

Paris, 2018 Claudia D’Ambrosio Keywords

Mixed integer non linear programming

Non convex problems

Well-structured problems

Real-world applications

Modeling

v vi Keywords Preface

This manuscript summarizes the main research topics I have been working on since my Ph.D. at University of Bologna (Italy). The broader common topic is mixed integer non linear programming (MINLP) problems, i.e., optimization problems in which both contin- uous and integer decision variables are present and, in general, non linear functions define the objective to minimize and the constraints to satisfy. In this context, I have been working both on theoretical aspects and real-world applica- tions. All the research performed aims at solving a class of MINLP problems by exploring and exploiting the specific properties that characterize such class. The manuscript is composed of two parts. PartI includes a complete curriculum vitæ and the most relevant publications. PartII is divided as follows:

In Chapter1 an introduction on MINLP problems and on classical methods to solve it and its most widely used subclasses is presented.

The research concerning exact methods for classes of MINLPs is described in Chap- ter2 . In particular, the chapter is composed of two main parts: i. methods based on the strengthening of classical relaxations and ii. methods based on non standard relaxations.

In Chapter3 we discuss heuristic methods for generic MINLP problems, i.e., Feasi- bility Pump and Optimistic Approximation for non convex MINLPs, and for specific problems like the non linear knapsack and its generalizations. The methods for non convex MINLP show special properties for subclasses of MINLPs. For example, the heuristics for the non linear knapsack problem exploit the fact that the only non linear functions, i.e., the profit and weight functions, are separable and non decreasing.

Other contributions are summarized in Chapter4 .

Finally, conclusions and future research are discussed in Chapter5 .

vii viii PREFACE Part I

Curriculum Vitae

1

Curriculum Vitae

Claudia D’Ambrosio June 11, 2018

CNRS Charg´eede Recherche and Charg´eed’Enseignement at:

LIX, Laboratoire d’Informatique de l’Ecole´ Polytechnique 91128 Palaiseau CEDEX, France Office: LIX 1066 (Batiment Alan Turing, first floor) E-mail address: [email protected] Office phone n.: +33.1.77.57.80.06

Personal Information and Education

From January 2006 to April 2009: PhD student in “Automatic Control and Operations Research” at the Dipartimento di Elettronica, Informatica e Sis- temistica (DEIS), University of Bologna (Italy). Thesis “Application-oriented Mixed Integer Non-Linear Programming”, advisor Prof. Andrea Lodi.

21st March 2005: Master degree in Computer Science Engineering at the Uni- versity of Bologna (Italy). Thesis: “Algorithms for the Water Network De- sign Problem” advisor Prof. Paolo Toth, co-advisors Prof. Andrea Lodi and Dr. Cristiana Bragalli.

Awards

2nd Robert Faure ROADEF prize 2015 (this is the triennial prize of the French OR society, with three laureates at each edition).

EURO Doctoral Dissertation Award 2010 with the PhD Thesis “Application- oriented Mixed Integer Non-Linear Programming”, EURO Conference 2010, Lis- bon (Portugal), 2010. It is the highest European recognition for a Ph.D. thesis in OR.

Funded ongoing projects

Principal Investigator and Supervisory Board member for CNRS, Euro- • pean Project (Marie Curie) “MINOA”, 2018-2021.

1 Leader of the Programme Gaspard Monge pour l’Optimisation project • “Shortest path problem variants for the hydro unit commitment problem”, funded by the Fondation mathematique Jacques Hadamard.

Leader of the Project “Learning to solve Hydro Unit Commitment prob- • lems in France” sponsored by the Siebel Energy Institute seed grant, http://www.siebelenergyinstitute.org/2016-research-grants/. Participant of the Project “From resource to price: machine learning for • Italian electricity network and market” sponsored by the Siebel Energy Institute seed grant, http://www.siebelenergyinstitute.org/2016-research-grants/.

Funded past projects

Leader of the Programme Gaspard Monge pour l’Optimisation project • “Decomposition and feasibility restoration for Cascaded Reservoir Man- agement”, funded by the Fondation mathematique Jacques Hadamard. Participant of the Programme Gaspard Monge pour l’Optimisation project • “R´esolutionde programmes polynomiaux, approche par relaxation quadra- tique convexe”, funded by the Fondation mathematique Jacques Hadamard. Management Commitee member of the COST Action TD1207 Mathe- • matical Optimization in the Decision Support Systems for Efficient and Robust Energy Networks (http://www.cost.eu/domains_actions/ict/ Actions/TD1207).

Leader of the Programme Gaspard Monge pour l’Optimisation project • “Optimality for Tough Combinatorial Hydro Valley Problems”, funded by the Fondation mathematique Jacques Hadamard (http://www.lix. polytechnique.fr/~dambrosio/PGMO.php). Participant of the European Project (Marie Curie) “MINO: Mixed Integer • Nonlinear Optimization” (http://www.mino-itn.unibo.it/). Participant of the ANR jcjc project “ATOMIC: Air Traffic Optimiza- • tion via Mixed Integer Computation” (http://www.recherche.enac.fr/ ~cafieri/atomic.html).

Professional Experience

From October 2015: senior research scientist (formerly charg´eede Recherche 1ere classe) at CNRS in the LIX, Ecole´ Polytechnique (France).

From September 2014: adjunct assistant professor (charg´eed’enseignement) at Ecole´ Polytechnique (France).

From October 2011 to September 2015: research scientist (charg´eede Recherche 2eme classe) at CNRS in the LIX, Ecole´ Polytechnique (France).

2 From November 2010 to August 2011: post-doc in “Algorithms and techniques for MINLP problems with a special attention to optimization problems for unit commitment, scheduling and distribution of electric power” at the Dipartimento di Elettronica, Informatica e Sistemistica (DEIS), University of Bologna (Italy).

From May to October 2010: post-doc in “Mixed Integer NonLinear Program- ming” at the Industrial and System Engineering Department (ISyE), University of Wisconsin - Madison (U.S.A.).

From May 2009 to April 2010: 1-year post-doc in “Algorithms and techniques for MINLP problems with a special attention to optimization problems for unit commitment, scheduling and distribution of electric power” at the Dipartimento di Elettronica, Informatica e Sistemistica (DEIS), University of Bologna (Italy).

From May 2008 to July 2008: consulting for OPTIT on a project with Hera Comm on the optimization of energy cogeneration plants under the supervision of Prof. Andrea Lodi.

From July 2007 to September 2007: working as co-op/intern at IBM TJ Wat- son Research Center (Yorktown, NY) under the supervision of Dr. Jon Lee and Dr. Andreas W¨achter on the research project “Mixed Integer ”.

From 2006: tutoring in Operations Research (undergraduate and master level).

From 2006 to 2009: PhD student in “Automatic Control and Operations Re- search” at the Dipartimento di Elettronica, Informatica e Sistemistica (DEIS), University of Bologna (Italy).

From July 2005 to December 2005: collaboration with the Operations Re- search group of the “Dipartimento di Elettronica, Informatica e Sistemistica (DEIS)” of the University of Bologna for the development of algorithms for the Waste Management Problem (UE Project Integrated Urban Waste-Management Model (IUWMM)).

From April 2005 to June 2005: collaboration with the Operations Research group of the “Dipartimento di Elettronica, Informatica e Sistemistica (DEIS)” of the University of Bologna for the development of algorithms for the Capacity Planning Problem (UE Project PARTNER).

Publications

International Journals 1. C. D’Ambrosio, M. Fampa, J. Lee, S. Vigerske. On a Nonconvex MINLP Formulation of the Euclidean Steiner Tree Problem in n-Space: missing proofs, Optimization Letters (accepted). 2. L. Mencarelli, C. D’Ambrosio. Complex Portfolio Selection via Convex

3 Mixed-Integer : A Survey, International Trans- actions in Operational Research (accepted).

3. P.-L. Poirion, S. Toubaline, C. D’Ambrosio, L. Liberti. An algorithm for bilevel programming, with applications to smart grids and social networks, Discrete Applied Mathematics (accepted). 4. Y. Sahraoui, P. Bendotti, C. D’Ambrosio. Real-world hydro-power unit- commitment: dealing with numerical errors and feasibility issues, Energy (accepted). 5. S. Cafieri, C. D’Ambrosio. Feasibility Pump algorithms for aircraft decon- fliction with speed regulation, Journal of (online first, https://doi.org/10.1007/s10898-017-0560-7). 6. C. D’Ambrosio, F. Furini, M. Monaci, E. Traversi. On the 0-1 Product Knapsack Problem, Optimization Letters, 2 (4), 691–712, 2018. 7. C. D’Ambrosio, S. Martello, L. Mencarelli. Relaxations and Heuristics for the General Multiple Non-linear Knapsack Problem, Computer & Operations Research, 93, pp. 79–89, 2018.

8. S. Toubaline, C. D’Ambrosio, L. Liberti, P.-L. Poirion, B. Schieber, H. Shachnai. Complexity and inapproximability results for the Power Edge Set problem, Journal of Combinatorial Optimization, 35 (3), pp. 895–905, 2018. 9. C. Buchheim, C. D’Ambrosio. Monomial-wise Optimal Separable Un- derestimators for Mixed-Integer Polynomial Optimization, Journal of Global Optimization, 67 (4), pp. 759–786, 2017. 10. V. Cacchiani, C. D’Ambrosio. A Branch-and-Bound based Heuristic Al- gorithm for Convex Multi-Objective MINLPs, European Journal of Operations Research, 260 (3), pp. 920–933, 2017.

11. C. D’Ambrosio, A. Frangioni, A. Lodi, M. Mevissen. Special issue on: Nonlinear and combinatorial methods for energy optimization. EURO Journal on Computational Optimization, 5(1), pp.1–3, 2017. 12. R. Taktak, C. D’Ambrosio. An Overview on Mathematical Programming Approaches for the Deterministic Unit Commitment Problem in Hydro Valleys, Energy Systems, 8(1), pp. 57–79, 2017. 13. C. D’Ambrosio, K.K. Vu, C. Lavor, L. Liberti, N. Maculan, New error measures and methods for realizing protein graphs from distance data, Discrete & Computational Geometry, 57(2), pp.371–418, 2017.

14. K.K. Vu, C. D’Ambrosio, Y. Hamadi, L. Liberti. Surrogate-based meth- ods for black-box optimization, International Transactions in Oper- ational Research, 24(3), pp. 393–424, 2017. 15. C. D’Ambrosio, G. Nannicini, G. Sartor. MILP models for the selection of a small set of well-distributed points, Operations Research Letters, 45, pp. 46–52, 2017.

4 16. P.-L. Poirion, S. Toubaline, C. D’Ambrosio, L. Liberti. The Power Edge Set problem, Networks, 68 (2), pp. 104–120, 2016. 17. C. D’Ambrosio, A. Lodi, S. Wiese, C. Bragalli. Mathematical Program- ming techniques in Water Network Optimization, European Journal of Operations Research, 243 (3), pp. 774–788, 2015. 18. R. Rovatti, C. D’Ambrosio, A. Lodi, S. Martello. Optimistic MILP Mod- eling of Non-linear Optimization Problems, European Journal of Op- erations Research, 239 (1), pp. 32–45, 2014. 19. G. Costa, C. D’Ambrosio, S. Martello. GraphsJ 3: A Modern Didactic Application for Graph Algorithms, Journal of Computer Science, 10 (7), pp. 1115–1119, 2014. 20. C. D’Ambrosio, A. Lodi. Mixed integer nonlinear programming tools: an updated practical overview, Annals of Operations Research, 204, pp. 301–320, 2013. 21. C. D’Ambrosio, A. Frangioni, L. Liberti, A. Lodi. A Storm of Feasibility Pumps for Nonconvex MINLP, Mathematical Programming, 136 (2), pp. 375–402, 2012. 22. C. Bragalli, C. D’Ambrosio, J. Lee, A. Lodi, P. Toth. On the Optimal Design of Water Distribution Networks: a Practical MINLP Approach, Optimization and Engineering, 13, pp. 219–246, 2012. 23. C. D’Ambrosio, A. Lodi. Mixed Integer Non-Linear Programming Tools: a Practical Overview, 4OR: A Quarterly Journal of Operations Research, 9 (4), pp. 329-349, 2011. 24. C. D’Ambrosio, S. Martello. Heuristic algorithms for the general nonlinear separable knapsack problems, Computers and Operations Research, 38 (2), pp. 505-513, 2011. 25. C. D’Ambrosio. Application-oriented Mixed Integer Non-Linear Program- ming. 4OR: A Quarterly Journal of Operations Research, 8 (3), pp. 319- 322, 2010. 26. C. D’Ambrosio, A. Frangioni, L. Liberti, A. Lodi. On Interval-subgradient and No-good Cuts, Operations Research Letters, 38, pp. 341-345, 2010 27. G. Costa, C. D’Ambrosio, S. Martello. A free educational Java framework for graph algorithms, Journal of Computer Science, 6 (1), pp. 87-91, 2010. 28. C. D’Ambrosio, A. Lodi, S. Martello. Piecewise linear approximation of functions of two variables in MILP models, Operations Research Letters, 38, pp. 39-46, 2010. 29. A. Borghetti, C. D’Ambrosio, A. Lodi, S. Martello. A MILP Approach for Short-Term Hydro Scheduling and Unit Commitment with Head-Dependent Reservoir, IEEE Transactions on Power Systems 23 (3), pp. 1115– 1124, 2008.

5 Book Chapters (refereed) 1. C. Bragalli, C. D’Ambrosio, J. Lee, A. Lodi, P. Toth. Optimizing the De- sign of Water Distribution Networks Using Mathematical Optimization. In K.G. Murty (ed.) Case Studies in Operations Research: applica- tions of optimum decision making, International Series in Operations Research & Management Science, 212, pp. 183-198, Springer-Verlag New York, 2015.

2. A. Borghetti, C. D’Ambrosio, A. Lodi, S. Martello. Optimal scheduling of a multi-unit hydro power station in a short-term time horizon. In K.G. Murty (ed.) Case Studies in Operations Research: applications of optimum decision making, International Series in Operations Research & Management Science, 212, pp. 167-181, Springer-Verlag New York, 2015.

3. C. D’Ambrosio, J. Lee, A. W¨achter. An algorithmic framework for MINLP with separable non-convexity, J. Lee and S. Leyffer (Eds.): Mixed-Integer Nonlinear Optimization: Algorithmic Advances and Applica- tions, The IMA Volumes in Mathematics and its Applications, Springer New York, 154, pp. 315-347, 2012.

4. C. D’Ambrosio, A. Lodi, S. Martello. Combinatorial Traveling Salesman Problem Algorithms, J.J. Cochran et al. (Eds.),: Wiley Encyclopedia of Operations Research and Management Science, John Wiley and Sons, Inc., 1, pp. 738-747, 2010.

Conferences (refereed) 1. P.-O. Bauguion, C. D’Ambrosio, L. Liberti. Maximum concurrent flow with incomplete data, In Lecture Notes in Computer Science, ISCO 2018, to appear.

2. W. van Ackooij, C. D’Ambrosio, L. Liberti, R. Taktak, D. Thomopulos, S. Toubaline. Shortest Path Problem variants for the Hydro Unit Com- mitment Problem. Electronic Notes in Discrete Mathematics, to appear.

3. W. van Ackooij, C. D’Ambrosio, R. Taktak. Decomposition and feasi- bility restoration for cascaded reservoir management, A. Sforza and C. Sterle (Eds.),: Optimization and Decision Science: Methodologies and Applications: ODS, Sorrento, Italy, September 4-7, 2017, Springer International Publishing, pp. 629–637.

4. C. D’Ambrosio, L. Liberti. Distance geometry in linearizable norms. In Lecture Notes in Computer Science, vol. 10589, Geometric Sci- ence of Information: Third International Conference, GSI 2017, Paris, France, November 7-9, 2017, Proceedings, F. Nielsen and F. Barbaresco eds., Springer International Publishing, pp. 830–837, 2017. 5. L. Liberti, C. D’Ambrosio. The Isomap algorithm in distance geome- try. In Leibniz International Proceedings in Informatics (LIPIcs),

6 vol. 75, 16th International Symposium on Experimental Algorithms (SEA 2017), Costas S. Iliopoulos and Solon P. Pissis and Simon J. Puglisi and Rajeev Raman eds., Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, pp. 5:1–5:13, 2017. ISBN = 978-3-95977-036-1, ISSN = 1868-8969, 6. O. Wang, L. Liberti, C. D’Ambrosio, C. De Sainte Marie, C. Ke. Con- trolling the average behaviour of business rules programs. In Lecture Notes in Computer Science, vol. 9718, Rule Technologies. Research, Tools, and Applications: 10th International Symposium, RuleML 2016 Proceedings, J. J. Alferes et al. eds., Springer International Publishing, pp. 83–96, 2016. 7. S. Toubaline, P.-L. Poirion, C. D’Ambrosio, L. Liberti. Observing the state of a smart grid using bilevel programming. In Lecture Notes in Computer Science, vol. 9486, Combinatorial Optimization and Appli- cations COCOA 2015, Z. Lu et al. eds., Springer International Publish- ing, pp. 364–376, 2015. 8. L. Liberti, C. D’Ambrosio, P.-L. Poirion, S. Toubaline, Measuring smart grids, Proceedings of AIRO 2015: pp. 11-20, 2015. 9. C. D’Ambrosio, M. Fampa, J. Lee, S. Vigerske. On a Nonconvex MINLP Formulation of the Euclidean Steiner Tree Problem in n-Space, Lecture Notes in Computer Science, vol. 9125, International Symposium on Experimental Algorithms SEA 2015, E. Bampis ed., pp. 122–133, 2015. 10. C. Lizon, C. D’Ambrosio, L. Liberti, M. Le Ravalec and D. Sinoquet. A Mixed-integer Nonlinear Optimization Approach for Well Placement and Geometry. In Proceedings of ECMOR XIV - 14th European conference on the mathematics of oil recovery, Catania, Italy, pp. 1838–1848, 2014. 11. C. Buchheim, C. D’Ambrosio. Box-Constrained Mixed-Integer Polyno- mial Optimization Using Separable Underestimators, Lecture Notes in Computer Science vol. 8494, Integer Programming and Combinato- rial Optimization Conference - IPCO 2014, J. Lee and J. Vygen eds., Springer International Publishing, pp. 198–209, 2014. 12. C. D’Ambrosio, J.T. Linderoth, J. Luedtke. Valid Inequalities for the Pooling Problem with Binary Variables, Lecture Notes in Computer Science vol. 6655, Integer Programming and Combinatorial Optimization Conference - IPCO 2011, O. Gunluk and G.J. Woeginger eds., Springer- Verlag, pp. 117-129, 2011. 13. C. D’Ambrosio, A. Frangioni, L. Liberti, A. Lodi. Experiments with a Feasibility Pump Approach for Non-Convex MINLPs, Lecture Notes in Computer Science vol. 6049, 9th International Symposium on Experi- mental Algorithms - SEA 2010, P. Festa ed., Springer-Verlag, pp. 350 - 360, 2010. 14. C. D’Ambrosio, J. Lee, A. W¨achter. A global-optimization algorithm for mixed-integer nonlinear programs having separable non-convexity, A. Fiat and P. Sanders (Eds.): ESA 2009 (17th Annual European Symposium.

7 Copenhagen, Denmark, September 2009), Lecture Notes in Computer Science 5757, pp. 107-118, Springer-Verlag Berlin Heidelberg, 2009.

15. C. Bragalli, C. D’Ambrosio, J. Lee, A. Lodi, P. Toth. An MINLP solution method for a Water Network Problem, Y. Azar and T. Erlebach (Eds.): Algorithms - ESA 2006 (14th Annual European Symposium. Zurich, Switzerland, September 2006), Lecture Notes in Computer Science, pp. 696-707. Springer, 2006.

Short papers in Conference (refereed) 1. L. Mencarelli, C. D’Ambrosio, A. Di Zio, S. Martello, Heuristics for the General Multiple Non-linear Knapsack Problem. Electronic Notes in Discrete Mathematics, 55, pp. 69–72, 2016.

2. C. D’Ambrosio, A. Frangioni, C. Gentile. Strengthening Convex Relax- ations of Mixed Integer Non Linear Programming Problems with Separa- ble Non Convexities. In Proceedings of Global Optimization Work- shop 2016 (GOW’16), Braga, Portugal, pp. 49–52, 2016.

3. C. D’Ambrosio, J. Linderoth, J. Luedtke, J. Schweiger. Strong Convex Nonlinear Relaxations of the Pooling Problem, Oberwolfach workshop on “Mixed-integer Nonlinear Optimization: A Hatchery for Modern Math- ematics”, MFO, Oberwolfach, pp. 2715–2716, October 2015. 4. C. Lizon, C. D’Ambrosio, L. Liberti. M´ethode non lin´eairemixte en vari- ables entires et r´eellespour le placement et la g´eom´etriedes puits en ing´enieriede r´eservoir. In Proceedings of Journ´eesPoly`edreset Optimisation Combinatoire (JPOC9), Le Havre, France, 2015. 5. P.-L. Poirion, S. Toubaline, C. D’Ambrosio, L. Liberti. Localization on smart grids. In Proceedings of Mathematical and Applied Global Optimization (GOW’14), Global Optimization Workshop 2014, Malaga, Spain, pp. 101-103, 2014. 6. C. D’Ambrosio, K. Vu Khac, C. Lavor, L. Liberti, N. Maculan. Compu- tational experience on Distance Geometry Problems. In Proceedings of Mathematical and Applied Global Optimization (GOW’14), Global Optimization Workshop 2014, Malaga, Spain, pp. 97-100, 2014.

7. C. D’Ambrosio, S. Martello. A heuristic algorithm for the general nonlin- ear separable knapsack problem. In Proceedings of European Work- shop on Mixed Integer Nonlinear Programming. Marseille, France, pp. 115-117, 2010.

8. C. D’Ambrosio, A. Frangioni, L. Liberti, A. Lodi. Feasibility Pump(s) for Non-Convex Mixed-Integer NonLinear Programs. In Proceedings of European Workshop on Mixed Integer Nonlinear Programming. Marseille, France, pp. 25-26, 2010.

8 Technical Reports and Working Papers 1. C. D’Ambrosio, J. Lee, L. Liberti, M. Ovsjanikov. Extrapolating curva- ture lines in rough concept sketches using mixed-integer nonlinear opti- mization (submitted). 2. J. Luedtke, C. D’Ambrosio, J. Linderoth, J. Schweiger. Strong Convex Nonlinear Relaxations of the Pooling Problem: Extreme Points, ZIB Re- port 18-13.

3. J. Luedtke, C. D’Ambrosio, J. Linderoth, J. Schweiger. Strong Convex Nonlinear Relaxations of the Pooling Problem, ZIB Report 18-12 (sub- mitted). 4. M. St´efanon,P. Drobinski, J. Badosa, S. Concettini, A. Creti, C. D’Ambrosio, D. Thomopulos, P. Tankov. Scenario for wind-solar energy mix in Italy from regional climate simulations (submitted). 5. C. D’Ambrosio, L. Liberti, P.-L. Poirion, K. Vu. Random projections for trust-region subproblems with applications to derivative-free optimization (submitted).

6. C. D’Ambrosio, A. Frangioni, C. Gentile. Strengthening the Sequential Convex MINLP Technique by Perspective Reformulations, IASI-CNR, Re- port n. 17-01 (under review).

Divulgation articles 1. C. D’Ambrosio. Non Convex Mixed Integer Non Linear Programming, Bulletin de la ROADEF, 36, pp. 4–7, 2016.

Theses C. D’Ambrosio. Application-oriented Mixed Integer Non-Linear Program- • ming, PhD Thesis (April 2009). Under the supervision of Prof. Andrea Lodi (University of Bologna). C. D’Ambrosio. Algorithms for the Water Network Design Problem, Mas- • ter Thesis (March 2005). Under the supervision of Dr. Cristiana Bragalli, Prof. Andrea Lodi, Prof. Paolo Toth (University of Bologna). C. D’Ambrosio. Query with Preferences: the Best Operator, Bachelor • Thesis (December 2002). Under the supervision of Prof. Paolo Ciaccia (University of Bologna).

Plenary Talks

1. C. D’Ambrosio, Challenging Problems in Energy Optimization: the Hydro Unit Commitment, Havana, Cuba, December 2017.

2. C. D’Ambrosio, Challenging Problems in Energy in Optimization: the Hydro Unit Commitment, Blumenau, Brasil, August 2017.

9 3. C. D’Ambrosio, Using Bilevel Programming to Monitor Smart Grids, Workshop on Nonlinear Optimization Algorithms and Industrial Appli- cations, Toronto, Canada, June 2016.

4. C. D’Ambrosio, On Black-Box Mixed Integer Non Linear Programming, MIP, Mixed Integer Programming Workshop 2016, Miami, FL, USA, May 2016. 5. C. D’Ambrosio, Strong Convex Nonlinear Relaxations of the Pooling Prob- lem, Oberwolfach workshop on “Mixed-integer Nonlinear Optimization: A Hatchery for Modern Mathematics”, MFO, Oberwolfach, October 2015. 6. C. D’Ambrosio, Optimization based on mixed integer nonlinear program- ming (MINLP) method, Workshop “Modelling Smart Grids: A New Chal- lenge for Stochastics and Optimization”, University of Prague, Czech Re- public, September 2015. 7. C. D’Ambrosio, Global optimization for MINLPs with separable noncon- vexities: convex relaxations and perspective reformulations, 2nd Sevilla Workshop on “Mixed Integer Nonlinear Programming: Theory, algorithms and applications”, University of Seville, Spain, March 2015.

8. C. D’Ambrosio. MINO/COST training school, University of Tilburg, The Netherlands, March 2015. 9. C. D’Ambrosio, On the Mathematical Models and Methods for the Hydro Unit Commitment Challenges, COST Workshop on Mathematical Models and Methods for Energy Optimization (CWM3EO), September 2014.

10. C. D’Ambrosio, 5th Porto Meeting on Mathematics for Industry, April 2014. Three talks: Basic notions of Mixed Integer Non-Linear Programming • MINLP applications, part I: Hydro Unit Commitment and Pooling • Problem MINLP applications, part II: Water Network Design and some ap- • plications of black-box optimization 11. C. D’Ambrosio, Mathematical Programming Approaches for Water Net- work Design, Journ´eeOptimisation des R´eseaux,November 2013.

12. C. D’Ambrosio, Real-world Mixed Integer NonLinear Programming appli- cations: MILP and MINLP approaches, NATCOR Combinatorial Opti- mization course, Southampton, UK, September 2011. 13. C. D’Ambrosio, Optimistic MILP Modeling of Non-linear Optimization Problems, MIP, Mixed Integer Programming Workshop 2011, Waterloo, ON, Canada, June 2011 (Invited by Prof. Ted Ralphs). 14. C. D’Ambrosio, A Tutorial on Convex Mixed-Integer Nonlinear Program- ming, 3rd LANCS Workshop on Discrete and Non-Linear Optimisation. Lancaster, UK, April 2011. (Invited by Prof. Adam Letchford.)

10 15. C. D’Ambrosio, A feasibility pump for non-convex MINLPs, 2nd LANCS Workshop on Discrete and Non-Linear Optimisation. Southampton, UK, March 2010. (Invited by Prof. Adam Letchford.)

16. C. D’Ambrosio, A Feasibility Pump Heuristic for Non-Convex MINLPs, Spring Workshop on Computational Issues in Mixed Integer Nonlinear Programming. Bordeaux, France, March 2009. (Invited by Prof. Andrew J. Miller.)

Seminars

1. C. D’Ambrosio, Classical formulations and strong valid inequalities for the pooling problem, S´eminaireParisien d’Optimisation at Institut Henri Poincar´e(France), February 2017.

2. C. D’Ambrosio, Smart Grids Observability using Bilevel Programming, CORE O.R. Seminars at Universit´eCatholique de Louvain (Belgium), April 2016. 3. C. D’Ambrosio, On the standard pooling problem and strong valid inequal- ities, CPSE Seminar Series at Imperial College (UK), February 2016.

4. C. D’Ambrosio, The pooling problem: classical formulations and stronger relaxations, IBM TJ Watson Research Center (USA), November 2015. 5. C. D’Ambrosio, The pooling problem: classical formulations and strong valid inequalities, GERAD, Ecole´ Polytechnique de Montreal (Canada), November 2015. 6. C. D’Ambrosio, A Tutorial on the Methods and Practice of Mixed Integer Nonlinear Programming, Universit´ede Li`ege(Belgium), March 2015. 7. C. D’Ambrosio, The importance of being... separable!, University of Cal- ifornia Davis, USA, November 2014.

8. C. D’Ambrosio, A branch-and-bound method for box constrained integer polynomial optimization, LIPN, Paris 13, December 2013. 9. C. D’Ambrosio, A branch-and-bound method for box constrained inte- ger polynomial optimization, Integer Programming for Lunch, IBM TJ Watson Reseach Center, USA, October 2013. 10. C. D’Ambrosio, Mixed Integer Linear Programming, EDF R&D, Clamart, France, September 2013 (2 lectures plus 2 exercises sessions). 11. C. D’Ambrosio, Fundamentals of the Theory and Practice of Mixed Inte- ger Nonlinear Programming, Universit`aPolitecnica delle Marche, Ancona, June 2013. 12. C. D’Ambrosio, Branch-and-bound method for box-constrained polyno- mial optimization, Universit´ecatholique de Louvain, Louvain-la-Neuve, June 2013.

11 13. C. D’Ambrosio, Linear approximation techniques for mixed integer nonlin- ear programming: methods and a real-world application, Singapore Uni- versity of Technology and Design, Singapore, January 2013.

14. C. D’Ambrosio, From Hydro Scheduling and UC with Head-Dependent Reservoir to Linear Approximation Techniques for Nonlinear Functions, EDF R&D, Clamart, France, December 2012. 15. C. D’Ambrosio, Optimality for Tough Combinatorial Hydro Valley Prob- lems, PGMO Seminars, Palaiseau, France, November 2012 16. C. D’Ambrosio, Feasibility Pump, heuristic methods for nonconvex Mixed Integer Nonlinear Programming problems, Technische Universit¨atDort- mund, Germany, April 2012. 17. C. D’Ambrosio, Pooling problem with binary variables: mixed integer programming relaxations and valid inequalities, LIF, Marseille, France, February 2012. 18. C. D’Ambrosio, Mixed integer nonlinear programming: heuristic and exact methods, INRIA, Bordeaux, France, March 2011.

19. C. D’Ambrosio, Linear approximation techniques for mixed integer non- linear programming: methods and a real-world application, INRIA, Bor- deaux, France, January 2011. 20. C. D’Ambrosio, Water Network Design: An MINLP approach, COPTA Talks. University of Wisconsin - Madison, USA, June 2010.

21. C. D’Ambrosio, Water Network Design by Mixed Integer Nonlinear Pro- gramming, University of Heidelberg, Germany, December 2009. 22. C. D’Ambrosio, Methods and Algorithms for Solving Non-Convex MINLP Problems, IBM T.J. Watson Research Center. Yorktown Heights, USA, September 2007.

Conference Talks

1. C. D’Ambrosio, A. Frangioni, C. Gentile. Strengthening Convex Relax- ations of Mixed Integer Non Linear Programming Problems with Separa- ble Non Convexities. GOW’16, Braga, Portugal, September 2016. 2. C. D’Ambrosio, J. Linderoth, J. Luedtke, J. Schweiger. Strong Valid Inequalities for the Standard Pooling Problem, ICCOPT 2016. Tokyo, Japan, August 2016. 3. C. D’Ambrosio, J. Linderoth, J. Luedtke, A. Miller. Valid Inequalities for the Pooling Problem, ECCO 2015. Catania, Italy, May 2015. 4. C. D’Ambrosio. Presentation of the 2me Prix Robert Faure 2015, Roadef 2015. Marseille, France, February 2015.

12 5. A. Conn, C. D’Ambrosio, L. Liberti, C. Lizon, D. Sinoquet, K. Vu Khac. A trust region method for solving grey-box MINLP, INFORMS 2014. San Francisco, USA, November 2014.

6. C. Buchheim, C. D’Ambrosio. A branch-and-bound method for box con- strained integer polynomial optimization, IFORS 2014. Barcelona, Spain, July 2014. 7. C. Buchheim, C. D’Ambrosio. A Branch-and-Bound Method for Box- Constrained Mixed-Integer Polynomial Optimization Using Separable Un- derestimators, Roadef 2014. Bordeaux, France, February 2014. 8. C. Buchheim, C. D’Ambrosio. New underestimators for box constrained integer polynomial optimization, INFORMS 2013. Minneapolis, USA, October 2013.

9. C. Buchheim, C. D’Ambrosio. A fast exact method for box constrained polynomial optimization, ECCO 2013. Paris, France, May 2013. 10. W. van Ackooij, C. D’Ambrosio, G. Doukopoulos, A. Frangioni, C. Gen- tile, F. Roupin, T. Simovic. Optimality for Tough Combinatorial Hydro Valley Problems, Roadef 2013. Troyes, France, February 2013.

11. C. D’Ambrosio, A. Lodi, S. Martello, R. Rovatti. Optimistic modeling of non-linear optimization problems by mixed-integer linear programming, ISMP 2012. Berlin, Germany, August 2012. 12. C. D’Ambrosio, A. Lodi, S. Martello, R. Rovatti. Optimistically Approx- imating Non-linear Optimization Problems through MILP, EURO 2012. Vilnius, Lithuania, July 2012. 13. C. D’Ambrosio, A. Frangioni, L. Liberti A. Lodi. Extending Feasibil- ity Pump to nonconvex mixed integer nonlinear programming problems, Roadef 2012. Angers, France, April 2012.

14. C. D’Ambrosio, A. Frangioni, L. Liberti A. Lodi. Feasibility Pump al- gorithms for nonconvex Mixed Integer Nonlinear Programming problems, APMOD 2012. Paderborn, Germany, March 2012. 15. C. D’Ambrosio, J. Linderoth, J. Luedtke. Valid Inequalities for the Pool- ing problem with Binary Variables, IPCO Conference. Armonk, NY, USA., June 2011. 16. C. D’Ambrosio, J. Linderoth, J. Luedtke. Pooling problem with binary variables, SIAM Conference on Optimization. Darmstadt, Germany, May 2011. 17. C. D’Ambrosio, J. Linderoth, J. Luedtke. Valid inequalities for the pooling problem with binary variables, XV International Workshop on Combina- torial Optimization. Aussois, France, January 2011. 18. C. D’Ambrosio, J. Linderoth, J. Luedtke, A. Miller. Pooling Problems with Binary Variables, Modeling and Optimization: Theory and Applica- tion (MOPTA) Conference. Bethlehem, USA, August 2010.

13 19. C. D’Ambrosio. Application-oriented Mixed Integer Non-Linear Program- ming, 24th European Conference on Operations Research. Lisbon, Portu- gal, July 2010.

20. C. D’Ambrosio, S. Martello. A heuristic algorithm for the general nonlin- ear separable knapsack problem, European Workshop on Mixed Integer Nonlinear Programming. Marseille, France, April 2010. 21. C. D’Ambrosio, J. Lee, A. W¨achter. An exact algorithm for separable non-convex MINLPs, XIV International Workshop on Combinatorial Op- timization. Aussois, France, January 2010. 22. C. D’Ambrosio, A. Frangioni, L. Liberti, A. Lodi. A Feasibility Pump Algorithm for Non-Convex Mixed Integer Non-Linear Programming Prob- lems, XL AIRO Annual Conference. Siena, Italy, September 2009. (In- vited Session organized by Andrea Lodi and myself.) 23. C. D’Ambrosio, A. Lodi. Application-oriented Mixed Integer Non-Linear Programming, XL AIRO Annual Conference. Siena, Italy, September 2009. 24. C. D’Ambrosio, J. Lee, A. W¨achter. A global-optimization algorithm for mixed-integer nonlinear programs having separable non-convexity, 17th Annual European Symposium on Algorithms (ESA). Copenhagen, Den- mark, September 2009. 25. C. D’Ambrosio, J. Lee, A. W¨achter. An Algorithmic Framework for Sepa- rable Non-convex MINLP, 20th International Symposium of Mathematical Programming (ISMP). Chicago, USA, August 2009. (Invited Session or- ganized by Pietro Belotti.) 26. C. D’Ambrosio, A. Lodi, S. Martello. On the piecewise linear approxima- tion of functions of two variables in MILP models, AIRO Winter 2009. Cortina d’Ampezzo, Italy, January 2009.

27. C. D’Ambrosio , M. Fischetti, A. Lodi, A. W¨achter. Finding Mixed- Integer Linear Solutions through Non-Linear Programming, XXXIX AIRO Annual Conference. Ischia, Italy, September 2008. 28. A. Borghetti, C. D’Ambrosio, A. Lodi, S. Martello. MILP techniques for solving hydro scheduling and unit commitment problem, XXXIX AIRO Annual Conference. Ischia, Italy, September 2008. 29. C. D’Ambrosio , J. Lee, A. W¨achter. An Algorithmic Framework for a Class of Non-Convex Minlp Problems, SIAM Conference on Optimization 2008. Boston, USA, May 2008.

30. A. Borghetti, C. D’Ambrosio, A. Lodi, S. Martello. MILP Techniques for Solving Hydro Scheduling and Unit Commitment Problems, 2nd FIMA International Conference. Champoluc, Italy, January 2008. 31. C. Bragalli, C. D’Ambrosio, J. Lee, A. Lodi, P. Toth. An MINLP solution method for a Water Network Problem, XXXVII AIRO Annual Conference. Cesena, Italy, September 2006.

14 Posters

1. Wim van Ackooij, Claudia D’Ambrosio, Grace Doukopouloss, Antonio Frangioni, Claudio Gentile, Frederic Roupin, Youcef Sahraoui, and Tomas Simovic. Optimality for Tough Combinatorial Hydro Valley Problems. PGMO Opening Conference, Palaiseau, France, October 2013. 2. Wim van Ackooij, Claudia D’Ambrosio, Grace Doukopouloss, Antonio Frangioni, Claudio Gentile, Frederic Roupin, and Tomas Simovic. Opti- mality for Tough Combinatorial Hydro Valley Problems. PGMO Opening Conference, Palaiseau, France, September 2012.

3. C. D’Ambrosio, A. Frangioni, L. Liberti, A. Lodi. On Interval-subgradient and No-good Cuts. MIP 2010 Workshop on Mixed Integer Programming. Atlanta, USA, July 2010. 4. C. Bragalli, C. D’Ambrosio, J. Lee, A. Lodi, P. Toth. Water Network De- sign by MINLP. IMA Workshop on Mixed-Integer Nonlinear Optimization: Algorithmic Advances and Applications. Minneapolis, USA, November 2008. 5. C. D’Ambrosio , M. Fischetti, A. Lodi, A. W¨achter. Non-Linear Program- ming based heuristic for Mixed-Integer Linear Programming. MIP 2008 Workshop on Mixed Integer Programming. New York, USA, August 2008.

Editorial Activities

Associate Editor of Optimization Methods & Software (since September 2016).

Associate Editor of Optimization and Engineering (since July 2016).

Guest Editor of EJOC (EURO Journal on Computational Optimiza- tion) for a special issue on “Nonlinear and Combinatorial Methods for Energy Optimization”.

Associate Editor of 4OR – A Quarterly Journal of Operations Research jointly published by the Belgian, French, and Italian Operations Research So- cieties (since June 2014).

Professional Duties

Scientific committee member at EUROPT Workshop 2018, Almeria, Spain (07/2018)

Organizer of the Energy stream at ISMP 2018, Bordeaux, France (07/2018).

Poster Award Committee member at MIP Workshop 2017, Montr´eal,Canada (06/2017).

15 President of the Ph.D. awarding committee of Faisal WAHID, Ecole´ Polytech- nique (06/2017).

Recruitment committee member for a Assistant Professor position at the Paris Dauphine (05/2017).

Recruitment committee member for an adjunct assistant professor position (charg´eed’enseignement) at Ecole´ Polytechnique (04/2017).

Scientific committee co-chair of the Mathematical Optimization in the Decision Support Systems for Efficient and Robust Energy Networks Final Conference 29/31 March 2017 Modena (Italy)

Recruitment committee member for a Professor position at the ENSTA (2016).

Head of Program “Optimisation”, Ecole´ Polytechnique (since academic year 2015-2016).

Chef d’´equipe (team leader) of the SYSMO team, LIX, Ecole´ Polytechnique (2014-2017).

Project Board Member of TREND-X, Transdisciplinary analysis of Renewable Energies at Ecole´ Polytechnique (since December 2014).

Organizing committee member of MINO/COST PhD School 2016.

Recruitment committee member for a Maitre de Conference position at the Laboratoire Jean Kuntzmann, Grenoble (2015).

Member of the Ph.D. awarding committee of Sell´eTour´e, Universit´ede Greno- ble (October 2014).

Recruitment committee member for a Maitre de Conference position at CNAM (2014).

Committee member of the Award for the Best EJOR Paper (EABEP) 2014.

Scientific committee member of ROADEF 2014.

Co-chair of the COST Workshop on Mixed Integer NonLinear Programming (CWMINLP) 2013.

Selection committee member for the LIX-Qualcomm-Carnot postdoc 2013-14.

Organizer of the Mixed-Integer Non Linear Programming stream at the EURO- INFORMS 2013.

Organizer of Mixed Integer Programming (MIP) workshop 2012.

Organizer of the Mixed-Integer Non Linear Programming stream at the EURO

16 2012.

Organizer of many sessions at the AIRO Annual Conference.

Scientific Community Contribution

Contributor of open-source solver ROSE (COIN-OR).

Contributor of open-source solver BONMIN (COIN-OR).

Teaching Experience

Responsible for the course: Big Data (INF442), Ecole´ Polytechnique, France for the academic years 2017-2018.

Assistant for the course: Big Data (INF442), Ecole´ Polytechnique, France for the academic years 2014-2015, 2015-2016, 2016-2017.

Co-responsible for the course: Initiation to Research, Master Parisien de Recherche Op´erationnelle,France for the academic year 2013-2014, 2014-2015, 2015-2016, 2016-2017.

Lecturer and responsible for the course: Advanced Mathematical Program- ming (PMA), Master Parisien de Recherche Op´erationnelle,France for the aca- demic year 2012-2013, 2013-2014, 2014-2015, 2015-2016, 2016-2017.

Assistant for the course: Constraint Programming and Mathematical Programming (INF580), Ecole´ Polytechnique, France for the academic year 2013-2014, 2014-2015.

Lecturer for the course: Introduction to C++ (INF585), Ecole´ Polytech- nique, France for the academic year 2012-2013.

Lecturer for the course: Optimization in Health Care of the Master in In- gegneria Clinica, COFIMP, Bologna (together with Professor Silvano Martello) for the academic years 2008-2009/2009-2010/2010-2011.

Tutor for the course: Operations Research M with Professor Silvano Martello at the Faculty of the Engineering of the University of Bologna (Italy) for the academic year 2009-2010/2010-2011.

Tutor for the course: Fundamentals of Operations Research L-A with Professor Silvano Martello at the Faculty of the Engineering of the University of Bologna (Italy) for the academic years 2006-2007/2007-2008/2008-2009/2009- 2010.

Tutor for the course: Operations Research L-S with Professor Silvano Martello

17 at the Faculty of the Engineering of the University of Bologna (Italy) for the academic years 2006-2007/2007-2008/2008-2009.

Tutor for the course: Lab of Optimization Tools L with Professor Andrea Lodi at the Faculty of the Engineering of the University of Bologna (Italy) for the academic years 2006-2007/2007-2008/2008-2009.

Tutor for the course: Lab of Operations Research L-A with Professor An- drea Lodi at the Faculty of the Engineering of the University of Bologna (Italy) for the academic years 2006-2007/2007-2008/2008-2009.

Languages

Italian: mother tongue.

English: good knowledge.

French: B2.

Computer Skills

Programming Languages: C, C++, Java, JSP.

Operating Systems: Linux, Windows.

Software for Optimization: AMPL, Arena, COIN-OR solvers, Cplex, GAMS, Julia, JuMP, LaTex, Matlab, MPL, Porta, SCIP.

PhD Students Current Gabriele Iommazzo: PhD co-advisor.

Past Luca Mencarelli: PhD co-advisor (European Project MINO).

Olivier Wang: PhD co-advisor (thesis CIFRE with IBM).

Youcef Sahraoui: PhD co-advisor (thesis CIFRE with EDF).

Claire Lizon: PhD co-advisor (thesis CIFRE with IFPEN).

18 Master and Bachelor Students Current Francesco Vita (Universit`aPolitecnica delle Marche, Italy).

Past Gabriele Iommazzo: co-advisor of the Master thesis in Master of Science in Business Informatics (2017). Now PhD student at Ecole´ Polytechnique.

Angelo Di Zio: co-advisor of the Master thesis in Computer Science Engineering (2012). Now at Ferrari S.P.A.

Enrico Galassi: co-advisor of the Bachelor thesis in Computer Science Engi- neering (2011).

Simone Bacchilega: co-advisor of the Bachelor thesis in Computer Science En- gineering (2011).

Michele Dinardo: co-advisor of the Master thesis in Computer Science Engi- neering (2010). Now at KPMG.

Gianluca Costa: co-advisor of the Bachelor thesis in Computer Science En- gineering (2009). Now at YOOX Group.

Paolo Magini: co-advisor of the Master thesis in Industrial Engineering (2006). Now at Reply.

19 22 Part II

Research Summary

23

Chapter 1

Introduction

Mathematical Programming (MP) allows the modeling of an extremely wide variety of optimization problems. From the user’s point of view, this means that instead of thinking about how to solve a problem, it suffices to think about how to formulate the problem — the solution will be delegated to one of several “general purpose solvers” available in the sci- entific literature. This “formulate-and-solve” approach makes the practice of optimization much more accessible to the average practitioner. However, keeping in mind the solution method characteristics when modeling is a non trivial task that could drastically influence the performance of the general purpose solvers. Within MP, Mixed-Integer Non Linear Programming (MINLP) is currently one of the largest class of problems for which a general purpose solver exists. MINLPs may involve linear and non linear functions of continuous, binary, and integer decision variables. As such, it captures an incredibly diverse variety of optimization problems. Real problems arising in engineering, industry, or the natural sciences often come with a lot of techni- cal overhead, which somehow obfuscates the essential underlying difficulties. We use two types of problem models: decision and optimization problems. The difference between a decision and an optimization problem concerns the type of admissible answer. A deci- sion problem is a formal question that only admits “yes” or “no” as an answer; a solution consists of the answer together with a certificate (or formal proof) of correctness — this certificate need not be unique. An optimization problem is also endowed with an “objective function” that maps certificates to scalars, and an optimization direction (either minimiza- tion or maximization). One then wants to find the best solution of the problem in terms of the objective function value. From a complexity viewpoint, MINLP is an NP-hard combinatorial problem, because it includes MILP [62], and it is typically very challenging also in practice. Worse, non convex integer optimization problems are in general undecidable, see [33]. In this paper the author presents an example of a quadratically constrained integer program and shows that no computing device exists that can compute the optimum for all problems in this class. In the remainder, we concentrate on the case where MINLP is decidable, which we can achieve by ensuring that each variable has finite lower and upper bound.

25 26 CHAPTER 1. INTRODUCTION TO MINLP PROBLEMS AND METHODS

In this work, we are mainly interested in techniques that improve the phase of solving optimization problems. In this chapter, we summarize the main classic aspects of formula- tions and algorithms in mathematical programming.

1.1 Modeling problems by Mathematical Programming

We could define the most general optimization problem that can be modeled as an MP prob- n1+n2 lem as follows: given positive integers n, n1, n2, m, an objective function f : R R, n1+n2 → and a set of functions gi : R R for any i = 1, . . . , m, the MATHEMATICAL PRO- → GRAMMING (MP) problem is:

min f(x) x g (x) 0 i = 1, . . . , m (1.1) i ≤ ∀  x X  ∈  n1 n2 where X R N and n = n1 + n2. ⊆ × Notice that MP requires the functions f, g = g , . . . , g as part of its input; this im- { 1 m} plies the existence of an uncountably infinite set of instances. The vector x = (x1, . . . , xn) is known as the decision vector, and each of its components is a decision variable. The inequality g (x) 0 is called a constraint. We can focus on minimization problems and on i ≤ inequalities and neglect equalities without loss of generality. Expressing a problem as a subclass of MP instances is also called “modeling a problem by MP”. Most often, an MP model of a given problem fixes f, g to specific functional forms expressed in terms of parameters and the solution in the decision vector x. A (potentially infinite) set of MP instances where f, g have been fixed to given functional forms is called a formulation.

1.1.1 Reformulations

An MP reformulation is a symbolic algorithm for transforming an MP formulation into an- other one with more desirable features. Reformulations are a tool to address the following question: any problem can be modeled as an MP in infinitely many ways: is there a best formulation for a given algorithm? The concept of “best” is relative and, in particular, we refer either to more tractable reformulations, i.e., reformulations that are easier to solve, or to stronger reformulations, i.e., reformulations which tractable relaxation is closer to the original problem. In the next two sections, we introduce the notion of relaxations and approximations that can be considered to an extend reformulations too, see [41]. 1.1. MODELING PROBLEMS BY MATHEMATICAL PROGRAMMING 27

1.1.2 Relaxations Relaxations are formulations that “relax” the original problem, i.e., optimization problems such that min f¯(y) y g¯i(y) 0 i = 1, . . . , r  (1.2) ≤ ∀ y Y  ∈ q1 q2 ¯ where X Y R N , q1 n1, q2 n2, f(y) f(x) for all x y, and ⊆ ⊆ × ≥ ≥ ≤ ⊆ x g(x) 0 Proj y g¯(y) 0 . The typical advantage of solving (1.2) instead of { | ≤ } ⊆ x{ | ≤ } (1.1) is that the former is easier (typically both from a theoretical and practical viewpoint) than the latter. In particular, usually, the relaxation is built in such a way so that if the optimal solution of the relaxation is feasible for the original problem, then it is also optimal for it. For this reason, relaxations play an important role in algorithms aimed at solving MP problems as explained in Sections 1.2,1.3, and 1.4. Some classic relaxations are: - continuous: obtained by relaxing x Rn1 Nn2 to x Rn ∈ × ∈ - linear: obtained by defining f¯(y) = c0y and g¯(y) = Ay b − - convex: obtained by defining f¯ and g¯ to be convex functions. The classic relaxation of specific formulations can be strengthened by adding constraints to the formulation itself. In particular, the aim of the strengthening is making the relaxation closer to the original problem. This brings to speed up the classic algorithms. As a practical example, the strengthening of a continuous/linear/convex relaxation of an MI(N)LP prob- lem aims at characterizing the convex hull of the original problem. Examples of strength- ening for special classes of MINLPs are presented in Section 2.1. In the MILP case, if the constraints that characterize the convex hull of the problem are provided to a branch- and-bound solver, it will solve the problem at the root node without the need of exploring further nodes as the continuous relaxation solution shows already the integrality property, considering non degenerate cases. However, finding the convex hull of a MILP is typi- cally hard and it could be characterized by an exponential number of constraints. Thus, the strengthening of the continuous relaxation of a MILP aims at getting as close as possible to the convex hull of the MILP while keeping the relaxation tractable.

1.1.3 Approximations Approximations are reformulations of a given problem where not all (or none of) the con- ditions mentioned for the relaxations are valid, i.e., min f˜(z) z g˜i(z) 0 i = 1, . . . , s  (1.3) ≤ ∀ z Z  ∈  28 CHAPTER 1. INTRODUCTION TO MINLP PROBLEMS AND METHODS

where Z Rp1 Np2 . The typical advantage of solving (1.3) instead of (1.1) is that ⊆ × the former is much easier (typically both from a theoretical and practical viewpoint) than the latter. As no particular property holds that links variables and function of the original problem and its approximation, we have no guarantee that if an optimal solution of the approximation is feasible for the original problem then it is optimal for it too. Keeping this in mind, approximations are in general useful when they are conceived to be as “close as possible” to the original problem but, at the same time, tractable. Note that further issues arise when solving an approximation of an optimization problem. However, the impact of these issues depends also on what we approximate. For example, when the approximation differs from the original problem only because of the objective function, the solution of the approximation will always be feasible for the original problem, but maybe non optimal. On the contrary, approximating the feasible region of the original problem typically results in providing solutions that might be infeasible in the original problem. This is also true for relaxations. An improvement of the approximation (as a strengthening of a relaxation) might bring to converge to identify a feasible solution for the original problem. However, in general, this process is not guaranteed to converge to a feasible solution, while it is typically the case for relaxations strengthening procedures like branching, cutting planes, etc (see Section 1.4 for further details). A generic and a tailored approximation method are discussed in Sections 3.1.2 and 3.2.1, respectively.

1.1.4 Solution methods

No algorithm currently exists that can solve any MP instance efficiently (even just from an empirical point of view). Instead, there are algorithms for solving subclasses of the MP problem, such as for example Linear Programs (LP), Integer Linear Programs (ILP), Mixed-Integer Linear Programs (MILP), convex Non Linear Programs (cNLP) and others. In general, MP algorithms are divided into two classes: exact and heuristic. Infor- mally speaking, exact algorithms provide a guarantee that the solution is globally optimal, whereas heuristic algorithms do not. Heuristic algorithms are usually conceived using some sort of “common sense” intuition and are aimed at finding good quality solutions in a short amount of time. Thus, they are very used in practice to solve real-world applications. If the formulation involves non linear functions and continuous variables, it is likely that solutions may not be representable exactly using floating point numbers; we still talk about “exact algorithms” in this setting if they provide an ε guarantee on the optimality of the (approximated) solution they find. More precisely, let x∗ be a global optimum of a MINLP problem whose components involve irrational numbers; and let A be an algorithm can find an approximated solution x¯ such that f(¯x) f(x∗) < ε. Then we still include A | − | in the class of exact algorithms. In the next three sections, we introduce fundamentals of the classic methods that are used to solve MILP, NLP, and MINLP problems, respectively. 1.2. MIXED INTEGER LINEAR PROGRAMMING TECHNOLOGY 29 1.2 Mixed Integer Linear Programming Technology

In this section we present considerations about MILP solvers, limited to aspects related to MINLP1. Until recent years, the standard approach for handling MINLP problems (especially by practitioners or scientists with a combinatorial optimization background) has basically been to solve an MILP approximation of it, i.e., problem (1.3) such that f˜ and g˜ are linear. Let us stress again that solving an MILP approximation of an MINLP does not necessarily mean solving a relaxation of it. The optimal solution of the MILP might be neither optimal nor feasible for the original problem, if no assumption is made on the MINLP structure. Standard techniques to approximate special classes of MINLP problems with an MILP model are, for example, piecewise linear approximation and, when non linear functions are univariate, Special Ordered Sets of type 2 can be defined to model the convex combinations that form the piecewise linear approximation. By defining a set of variables to be a special ordered set of type 2, one imposes that at most 2 such variables can take a nonzero value and that they must be adjacent (see, e.g., Beale and Tomlin [5]). In some of the available MILP solvers, the definition of special ordered sets of type 1 and 2 is recognized and these sets of variables are treated through special branching rules. However, it is possible to use any kind of MILP solver and to define a piecewise linear approximations of general MINLP by adding suitable binary variables, see, for example, [72]. In presence of special properties of the approximated functions, (even) a standard piece- wise linear approximation might result in a relaxation of the original problem (see, for ex- ample, Section 3.1.2). However, this is, in general, not the case. Alternatively, methods tailored for solving MINLPs can be employed, which consist in solving MILP relaxations of the MINLP problem. Algorithms in this category are, for example, outer approxima- tion, generalized Benders decomposition, and extended cutting plane (see, for example, Grossmann [28]). In these cases, an MILP is obtained starting from the original MINLP, but the approximation is constructed on purpose to provide a relaxation, and it is tightened runtime. Contributions in this field and examples of practical applications are discussed in Sec- tion 3.1. From a software viewpoint, another important role MILP plays in the solution of MINLP problems is that some MINLP techniques use MILP solvers as black-boxes to address care- fully defined MILP subproblems. These techniques are described as well in Section 1.4. Finally, we conclude this section by mentioning some MILP solvers, like the commer- cial ones , IBM-CPLEX, XPRESSOptimizer, and the open-source solvers CBC and SCIP. For a survey of the development of methods and software for MILPs the reader is referred, e.g., to Lodi [48], and, for detailed discussions, to [1] and [45].

1Note that we do not consider MINLPs that can be exactly reformulated as MILP problems. 30 CHAPTER 1. INTRODUCTION TO MINLP PROBLEMS AND METHODS 1.3 Non Linear Programming Technology

Non Linear programming subproblems play a fundamental role in the solution of MINLPs. Different approaches to the solution of NLPs are possible. Thus, different types of solvers are available, both commercial and open-source. The most commonly used and effective NLP solvers need the first and usually second derivatives of the non linear functions involved in the model. Thus, the non linear functions are assumed to be smooth. However, the information about derivatives sometimes cannot be computed. In this case, derivative free optimization methods can be applied. For details on this topic, the reader is referred to the book by Conn et al. [14]. An additional aspect that NLP solvers have typically in common is that they only guar- antee that the solution provided is a local optimum. Thus, when we use standard NLP solvers, we might not obtain the globally optimal solution. This is why some MINLP solvers are exact algorithms for convex MINLPs (where local optima of their continuous relaxation are also global optima) but only heuristics for non convex MINLPs. We discuss further this issue in Section 1.4. An alternative would be solving NLP relaxations of (non convex) MINLPs to optimality by means of global optimization algorithms. Of course that would be at the cost of the solver’s efficiency, and, for this reason, global optimization techniques are usually not used in MINLP solvers for handling the NLP subproblems. Any serious discussion about NLP algorithms would lead us outside the scope of the present manuscript. We refer the reader to Nocedal and Wright [54], where, at the end of each chapter describing an NLP method, a discussion about software is also provided. Examples of the most widely used solvers are CONOPT, FILTERSQP, IPOPT, KNITRO, MINOS, MOSEK, NPSOL, SNOPT. For an overview of the NLP software, the reader is referred to Leyffer and Mahajan [37].

1.4 Mixed Integer Non Linear Programming Technology

In this section, we describe the most commonly used algorithms for solving non convex MINLPs. We do not describe special-purpose methods, such as, for example, those for Semi-Definite programming, mixed integer Quadratic programming, or convex MINLPs (MINLP for which the continuous relaxation has a feasible region that is a convex set) but only general-purpose MINLP techniques. The interested reader is referred to (among many possible references) the recent MINLP Software survey in the Wiley Encyclopedia of Operations Research and Management Science [11] and to the recent comprehensive introduction to MINLP [6]. For a collection of some of the most exciting latest advances on MINLP, we refer the reader to [35]. Coming back to problem (1.1), in the case of convex MINLPs, extensions of MILP al- gorithms such as branch-and-bound, cutting planes, or branch-and-cut have been proposed, see, for example, [16, 29, 25, 59, 60]. However, if we do not make any convexity assump- 1.4. MIXED INTEGER NON LINEAR PROGRAMMING TECHNOLOGY 31 tion on the objective function and the constraints, then one of the main issues is that the NLP problems obtained by dropping the integrality requirements have, in general, local minima which are not global. This implies, that, if the NLP method used to solve the NLP subproblems of the branch-and-bound/cut does not guarantee that the solution provided is a global optimum (and this is usually the case for the main NLP solvers, e.g., those discussed in Section 1.3), feasible and even optimal solutions might be fathomed, for example, by comparison with an invalid lower bound. Moreover, the linearization cuts used by classic algorithms for convex MINLP are in general not valid for non convex constraints. It fol- lows that the linearization cuts might cut off not only infeasible points but also parts of the feasible region as depicted in Figure 1.1. Adding the cut represented by the dashed line

Figure 1.1: Example of “unsafe” linearization cut generated from a non convex constraint. in the figure would result in cutting off the upper part of the non convex feasible region, i.e., that above the cut. For this reason, if non convex constraints are involved, one has to carefully use linearization cuts or accept that the final solution might be not the optimal one. As for the methods for specifically handle non convexities, the first approach is to fully reformulate the problem. The exact reformulation is possible only in limited cases of non convex MINLPs and allows one to obtain an equivalent convex formulation of the non convex MINLP in the sense that: (i) all feasible x Rn1 Nn2 are mapped to unique and ∈ × 32 CHAPTER 1. INTRODUCTION TO MINLP PROBLEMS AND METHODS

feasible points of the feasible region of the convex formulation; and (ii) no other mixed integer (infeasible) solution x Rn1 Nn2 belongs to it. All the techniques tailored for ∈ × convex MINLPs can then be (safely) applied to the reformulated MINLP. For a detailed description of exact reformulations to standard forms, see, for example, Liberti et al. [42]. In this work, we focus on a second approach, which applies to a larger subset of non convex MINLPs, based on the use of convex envelopes or underestimators of the non con- vex feasible region, i.e., a relaxation like (1.2) where f¯ and g¯ are convex (or linear). Roughly speaking, the first step to reformulate the non convex problem (1.1) is to asso- ciate a convex relaxation with each part of the original model. This is obtained by adding auxiliary variables that “linearize” the non linearities of the original constraints. This way, complex non linearities are subdivided into simpler non linearities, at the cost of addi- tional variables and constraints. The basic non linearities are then relaxed using simple envelopes/underestimators. This type of reformulation by pieces only applies to factorable functions, i.e., functions that can be expressed as summations and products of univariate functions after appropriate recursive splitting. More precisely, a function f is factorable if it can be rewritten as:

f = fhk(x), Xh Yk where fhk(x) are univariate functions for all h, k. These functions can be reduced and reformulated into simpler non linear functions, adding auxiliary variables and constraints

such as shk = fhk(x), sh0 = k shk, sk00 = h sh0 . This modeling trick allows reformulating the model through predetermined operators for which convex underestimators are known, such as, for example, bilinear,Q trilinear, orP fractional terms (see, e.g., [40, 50, 68]). The use of underestimators makes the feasible region larger: if the optimal solution of the relaxation is feasible for the non convex MINLP, then it is also its global optimum. Oth- erwise, a refinement of the underestimation is needed. This is done by branching, not only on integer variables but also on continuous ones as depicted in Figure 1.2. This technique allows one to have a lower bound on the non convex MINLP optimum that can be used within an algorithm, like branch-and-bound specialized versions for global optimization, namely spatial branch-and-bound [68, 36, 38,7], branch-and-reduce [63, 64, 70], α-BB [4,2], Branch-and-Cut [34, 56]. Such algorithms provide a solution that is guaranteed to be a global optimum. The above specialized branch-and-bound methods for global opti- mization mainly differ on the techniques used to strengthen the relaxations and the branch- ing scheme adopted, namely, they might either concentrate first on discrete variables (until an integer feasible solution is found), then branch on continuous variables, or branch on both variable types without a prefixed priority. From the algorithmic viewpoint, one could also branch on continuous variables appearing in non convex terms first (see, e.g., [3] for details), but, to the best of our knowledge, this option is not available in the solvers. It is clear that algorithms of this type are very time-consuming in general and can be employed only for small- to medium-size instances. This is the price one has to pay for the guarantee of the global optimality of the solution provided. 1.4. MIXED INTEGER NON LINEAR PROGRAMMING TECHNOLOGY 33

Figure 1.2: Linear underestimators before and after branching on continuous variables.

From an implementation viewpoint, some complex structures are needed for imple- menting global optimization algorithms, such as the so-called symbolic expression trees (see, Smith and Pantelides [68] for their use in (global) optimization). For a detailed de- scription of the issues arising in developing and implementing a global optimization algo- rithm the reader is referred to Liberti [40]. The documentation of the specific solvers is also very useful. For an overview of the way to define functions f¯ and g¯ for non convex functions f and g with a specific structure, the reader is referred to [38, 39, 50, 55]. Finally, it is worth mentioning that some of the primal heuristic algorithms developed for MILP or convex MINLPs have been generalized to the non convex setting by specifically taking into account the non convexity issues described above. Actually, this is a very active research topic on which several contributions are on the way, see, e.g., [44, 19, 17, 53], to mention a few, and Section 3.1.1. Finally, as we did in the previous two sections, we list some general purpose solvers for non convex MINLP problems: ANTIGONE, BARON, , LINDO-global, and SCIP. In summary, the mathematical structure of the formulation has a major impact on solver performance. Moreover, formulations can be applied in sequence so as to guarantee certain mathematical properties of the resulting formulation (for example, a sequence of exact reformulations results in an exact reformulation [41]). 34 CHAPTER 1. INTRODUCTION TO MINLP PROBLEMS AND METHODS Chapter 2

Exact Methods for MINLP problems

In this chapter, we present exact methods for special classes of MINLP problems. As we mentioned in Section 1.4, in this manuscript we define “exact” the methods with a guarantee to converge to a global optimum within an ε-tolerance, thus we do not refer to exact rational solvers as, for example, [15]. This chapter is divided into two parts. Section 2.1 is devoted to methods aimed at the strengthening of classic relaxations such as continuous ones. In particular, we focus on spe- cial subclasses of MINLPs, i.e., the Euclidean Steiner Tree problem (2.1.1) and the Pooling problem (2.1.2). In Section 2.2 we describe two methods, namely SC-MINLP (2.2.1) and Polyopt (2.2.2), which are based on alternative relaxations of two special classes of separa- ble MINLPs, i.e., MINLPs with separable non convexities and box-constrained polynomial optimization problems. The alternative relaxations are obtained by exploiting the structure and properties of the specific class of problems, which make the methods tailored and non straightforwardly generalizable but more effective than the general methods. In both sections, the final aim is to explore the advantages and disadvantages of standard and non standard approaches that deeply exploit the structure of special subclasses of mixed integer non linear programs.

2.1 Methods based on strengthening of classic relaxations

As mentioned in Chapter1, the general purpose methods for MINLP problems are based on classic relaxations as continuous and linear/convex relaxations. In the next this section we introduce two problems, i.e., the Euclidean Steiner Tree problem and the Pooling problem, and two solution approaches that aim at strengthening their classic relaxations.

35 36 CHAPTER 2. EXACT METHODS FOR MINLP PROBLEMS

2.1.1 Euclidean Steiner Tree1

The Euclidean Steiner Tree problem (ESTP) is a challenging optimization problem for- mally defined as follows: we are given a set of points in Rn called terminals, find a tree of minimal Euclidean length spanning these points, using or not additional points, called Steiner points. The optimal solution of the ESTP is called Steiner minimal tree (SMT) and is characterized by the following properties:

each Steiner point in an SMT has degree 3 •

a Steiner point and its adjacent nodes lie in a plane, and the angles between the edges • connecting the point to its adjacent nodes are 120 degrees

a terminal in an SMT has degree between 1 and 3 •

an SMT on p terminals has at most p 2 Steiner points (see [31, 24]). • −

Examples of Steiner Minimal Trees are represented in Figure 2.1, where the blue dots represent the terminals and the red dots represent the Steiner points.

Figure 2.1: Euclidean Steiner Trees with n = 3 and n = 4.

A non convex MINLP formulation of the EST was proposed by Maculan et al. [49]. Before presenting the formulation, let us introduce some notation. Let P := 1, 2, ..., p be { } the indices associated with the given terminals a1, a2, ..., ap and S := p+1, p+2, ..., 2p { − 2 be the indices associated with the Steiner points which coordinates in Rn are represented } p+1 p+2 2p 2 by x , x , ..., x − . Let G = (V,E) be the corresponding graph with V = P S. Let ∪ us denote by (i, j) an edge of G, with i, j V such that i < j and define E := E E , ∈ 1 ∪ 2 where E := (i, j): i P, j S and E := (i, j): i S, j S . We introduce 1 { ∈ ∈ } 2 { ∈ ∈ } binary variables y for each edge (i, j) E, where y = 1 if the edge (i, j) is present in ij ∈ ij

1This section summarizes the contribution proposed in: C. D’Ambrosio, M. Fampa, J. Lee, S. Vigerske. On a Nonconvex MINLP Formulation of the Euclidean Steiner Tree Problem in n-Space, Lecture Notes in Computer Science, vol. 9125, International Symposium on Experimental Algorithms SEA 2015, E. Bampis ed., pp. 122–133, 2015. 2.1. METHODS BASED ON STRENGTHENING OF CLASSIC RELAXATIONS 37 the SMT and 0 otherwise. The ESTP is then formulated as follows:

(MMX) min ai xj y + xi xj y (2.1) k − k ij k − k ij (i,j) E1 (i,j) E2 X∈ X∈ y = 1 i P (2.2) ij ∀ ∈ j S X∈ y + y + y = 3 j S (2.3) ij kj jk ∀ ∈ i P kj,k S X∈ X∈ X∈ y = 1 j S p + 1 (2.4) kj ∀ ∈ \{ } k

the sum of the distance with respect to the closest terminal to i and j respectively, then they cannot be connected just through two Steiner points.

Note that we also reformulate MMX and all the additional valid inequalities so as to get rid of the non differentiability represented by the Euclidean norm, which might prevent NLP solvers to converge. The smoothing was carried over carefully so as to obtain a relaxation of MMX. The reader is referred to the published paper for the formal and complete definition of the valid inequalities and for details on the smoothing procedure. To test the effectiveness of the valid inequalities, we first provided the MMX model to a global optimization solver, namely SCIP. Then we provided it with MMX model with adjoint a different set of the new valid inequalities so as to test separately the effectiveness of each of them. Finally, we selected the three most effective valid inequality types and added all of them to MMX. The computational results are promising: we show that the lower bounds were better and the running time was shorter.

2.1.2 The pooling problem and its variant with binary variables2

The pooling problem consists of finding the optimal composition of final products obtained by blending in pools different percentages of raw materials. Formally, we are given a directed graph G = (V,A) where V is the set of vertices that is partitioned in three sets, i.e., the set of inputs or raw materials I, the set of pools or intermediate products L, and the set of outputs or final products J. Arcs (i, j) A link inputs to pools or outputs and ∈ pools to outputs, see, for example, Figure 2.2. Each node and arc is subject to capacity constraints but the “complicating constraints” concern the requirements on the quality of certain attributes of the final products. The quality is a linear combination of the attributes of the raw materials and intermediate products that compose the final product. As the quality of the attributes of the intermediate products is not known in advance, the constraint shows bilinear terms. We introduce the PQ-formulation [61, 69], the strongest known formulation for the standard pooling problem.

2This section summarizes the contribution presented in: J. Luedtke, C. D’Ambrosio, J. Linderoth, J. Schweiger. Strong Convex Nonlinear Relaxations of the Pooling Problem, ZIB Report 18-12 (submitted) and C. D’Ambrosio, J.T. Linderoth, J. Luedtke. Valid Inequalities for the Pooling Problem with Binary Variables, Lecture Notes in Computer Science vol. 6655, Integer Programming and Combinatorial Optimization Conference - IPCO 2011, O. Gunluk and G.J. Woeginger eds., Springer-Verlag, pp. 117-129, 2011. 2.1. METHODS BASED ON STRENGTHENING OF CLASSIC RELAXATIONS 39

Inputs I Pools L Outputs J

Figure 2.2: Representation of the set of interest

min cijxij (2.7) (i,j) A X∈ x + x C i I (2.8) il ij ≤ i ∀ ∈ l L:(i,l) A j J:(i,j) A ∈ X∈ ∈ X ∈ x C l L (2.9) lj ≤ l ∀ ∈ j J:(l,j) A ∈ X ∈ x + x C j J (2.10) lj ij ≤ j ∀ ∈ l L:(l,j) A i I:(i,j) A ∈ X ∈ ∈ X∈ q = 1 l L (2.11) il ∀ ∈ i I:(i,l) A ∈ X∈ w = q x i I, l L, j J :(i, j) A, (l, j)(2.12)A ilj il lj ∀ ∈ ∈ ∈ ∈ ∈ x = w i I, l L :(i, l) A (2.13) il ilj ∀ ∈ ∈ ∈ j J:(l,j) A ∈ X ∈ (γ γ )x + ik − jk ij iinI:(i,j) A X ∈ (γ γ )w 0 j J, k K (2.14) ik − jk ilj ≤ ∀ ∈ ∈ linL:(l,j) A iinI:(i,l) A X ∈ X ∈ w = x l L, j J :(l, j) A (2.15) ilj lj ∀ ∈ ∈ ∈ i I:(i,l) A ∈ X∈ w C q i I, l L :(i, l) A (2.16) ilj ≤ l il ∀ ∈ ∈ ∈ j J:(j,l) A ∈ X ∈ 0 x C i I L, j L J :(i, j) A (2.17) ≤ ij ≤ ij ∀ ∈ ∪ ∈ ∪ ∈ 0 q 1 i I, l L :(i, l) A. (2.18) ≤ il ≤ ∀ ∈ ∈ ∈ 40 CHAPTER 2. EXACT METHODS FOR MINLP PROBLEMS where, i I L, j L J :(i, j) A, x represents the flow going from node i to node ∀ ∈ ∪ ∈ ∪ ∈ ij j and, i I, l L :(i, l) A, q is the proportion of material in pool l coming from ∀ ∈ ∈ ∈ il source i. The objective function (2.7) minimizes the cost of production while satisfying the maximum capacity (2.8)-(2.10) at nodes in I, L, and J, respectively. Constraints (2.11) force the proportions for a given l to sum up to 1 and constraints (2.12) give the definition of dependent variables wilj. Note that this set of constraints is also the only non linear one. Constraints (2.13) define the relationship between x and w and constraints (2.14) represent an upper bound on the quality of each final product. Constraints (2.15) and (2.16) are the so-called Reformulation Linearization Technique [67] constraints which are redundant as they are obtained by the combination of previously introduced constraints. However, they are not redundant in the standard linear relaxation of the problem (McCormick envelopes [50] of constraints (2.12)) and, on the contrary, help to strengthen the linear relaxation of (2.7)-(2.18). Finally, constraints (2.17) and (2.18) are simple bounds on variables x and q, respectively. Main contribution. The aim of the submitted paper is to strengthen the PQ-formulation. To derive a stronger relaxation of the pooling problem, we identify a simple relaxation of the pooling problem step by step. First, we consider only one single attribute k K and ∈ relax all constraints concerning the other attributes. Next, we consider only one output j J, remove all nodes and arcs which are not in a path to output j. In particular, this ∈ involves also other outputs. Then, we focus on pool l L with the intention to split flows ∈ into two categories, i.e., the flow through pool l and aggregated flow on all paths not pass- ing through pool l. Finally, we aggregate all the flow from the inputs to pool l. We can then project the remaining variables and constraints on a 5-variables set T that is a non con- vex relaxation of the pooling problem. We performed an extreme points analysis, derive valid inequalities that fully characterize the convex hull of set T and lift back them on the original space to derive valid inequalities of the pooling problem. Computational results demonstrate that the inequalities can significantly strengthen the convex relaxation of even the most sophisticated formulations of the pooling problem.

Let us now consider the pooling problem with binary variables, i.e., a variant of the standard pooling problem where the topology of the network is not fixed. The reader is referred to the exhaustive survey [52] for a larger view on pooling problem variants. The pooling problem with binary variables can be modeled as a non convex MINLP. Let us add a few parameters and constraints. Parameters fil, which represent the fixed cost that has to be paid if the arc from input stream i I to pool l L is used. ∈ ∈ The new decision variables in the formulation are vil, binary variables with value 1 if the arc from input i I to pool l L is used, 0 otherwise. ∈ ∈ The objective is (2.7) is modified as follow:

min cijxij + filvil. (i,j) A i I l L X∈ X∈ X∈ The q variables must satisfy properties of proportion and only be positive if the associated 2.2. ON SEPARABLE FUNCTIONS 41 arc was opened. Thus, (2.11) is replaced by

q 1 l L; il ≤ ∀ ∈ i I X∈ and the following constraints are adjoined

q v i I, l L. il ≤ il ∀ ∈ ∀ ∈

Main contribution. Also in this case, we focus on a single output and a single attribute and we extract an interesting set for which we derive valid inequalities. The set is a relaxation of the pooling problem with binary variables and a generalization of the well-known single- node fixed-charge flow set [57]. We introduce four classes of inequalities for this set. We call the first three classes quality inequalities because they are based on the constraints on the quality upper bound for a final product. The final class of valid inequalities is called pooling flow cover inequalities, as they are related to, but dominate, standard flow cover inequalities for the single-node fixed-charge flow set. We show that there are exponentially many pooling flow cover inequalities, but also that they can be separated in polynomial time. Our new valid inequalities are shown to be quite useful computationally on large instances. Inequalities generated at the root node of a branch-and-bound tree were added to a standard model for the problem and given to the global optimization solver BARON. With the strengthened model on a test suite of 76 instances solvable by BARON in 2 hours, adding the inequalities reduced BARON’s CPU time by a factor 2.

2.2 On Separable Functions

In the following, we present two global optimization methods that exploit the properties of separable functions. In particular, in Section 2.2.1 we aim at solving MINLP with con- straints including non convex separable functions which are underestimating by distin- guishing their convex and concave part. In Section 2.2.2 we use separable underestimators for solving box-constrained polynomial optimization problems. 42 CHAPTER 2. EXACT METHODS FOR MINLP PROBLEMS

2.2.1 SC-MINLP3 In this section we focus on MINLP problems with separable non convexities defined as follows:

min Cjxj j N X∈ f (x) + g (x ) 0 i M i ik k ≤ ∀ ∈ k H X∈ i L x U j N j ≤ j ≤ j ∀ ∈ x integer j I j ∀ ∈ where:

n fi : R R are convex functions i M, → ∀ ∈ gik : R R are non convex univariate function i M, k Hi, → ∀ ∈ ∀ ∈ H N i M, and i ⊆ ∀ ∈ I N. ⊆ For j H we assume that the bounds are finite i M. Note also that, if H is empty, ∈ i ∀ ∈ i the corresponding i-th constraint defines a convex set. The problem, that we will call P in the following, is an MINLP problem with separable non convexities represented by terms

k H gik(xk) i M with Hi non empty. ∈ i ∀ ∈ Main contribution. We propose the Sequential Convex MINLP (SC-MINLP) algo- rithm,P a global optimization method for solving mixed integer non linear programming problems with separable non convexities. It is based on a sequence of convex mixed in- teger non linear programming relaxation and non convex non linear programming restric- tions of the problem. These subproblems are solved at each iteration of the algorithm. The SC-MINLP algorithm provides at each iteration a lower bound, as solution of the convex MINLP relaxation of P , and an upper bound, as solution of the non convex NLP restriction of P . Both problems are solved through solvers used as “black-box”. Let us now introduce first the convex MINLP relaxation and then the non convex non linear programming restriction. Convex MINLP relaxation. For simplicity, let us consider a term of the form g(xk) := gik(xk) such that g : R R is a univariate non convex function of xk , for some k (k N). → ∈ 3This section summarizes the contribution presented in: C. D’Ambrosio, J. Lee, A. Wachter.¨ An algo- rithmic framework for MINLP with separable non-convexity, J. Lee and S. Leyffer (Eds.): Mixed-Integer Nonlinear Optimization: Algorithmic Advances and Applications, The IMA Volumes in Mathematics and its Applications, Springer New York, 154, pp. 315-347, 2012 and C. D’Ambrosio, J. Lee, A. Wachter.¨ A global-optimization algorithm for mixed-integer nonlinear programs having separable non-convexity, A. Fiat and P. Sanders (Eds.): ESA 2009 (17th Annual European Symposium. Copenhagen, Denmark, September 2009), Lecture Notes in Computer Science 5757, pp. 107-118, Springer-Verlag Berlin Heidelberg, 2009. 2.2. ON SEPARABLE FUNCTIONS 43

As the term g(xk) is univariate, it is possible to automatically detect the concavity/convexity intervals and to compute the following parameters and sets:

[Pp 1,Pp] := the p-th subinterval of the domain of g (p 1 ... p ); − ∈ { } Hˇ := the set of indices of subintervals on which g is convex;

Hˆ := the set of indices of subintervals on which g is concave.

The main idea is to find a relaxation of P that keeps the convex parts of g(xk) while replaces the concave intervals with a piecewise linear relaxation, see Figure 2.3.

2.5 2.5

2 2

1.5 1.5

1 1

0.5 0.5

0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

Figure 2.3: Univariate function g(xk) on the left and its convex MINLP relaxation on the right

To do this, we replace the term g(xk) with:

p 1 p Hˇ g(Pp 1 + δp) + p Hˆ (αpg(Pp) + (1 αp)g(Pp 1)) p=1− g(Pp) , ∈ − ∈ − − − and we includeP the following setP of new variables P

δp: a continuous variable assuming a positive value iff xk Pp 1 (p 1,..., p¯ ); ≥ − ∈ { } z 0, 1 : binary variable taking value 1 iff x P (p 1,..., p¯ 1 ); p ∈ { } k ≥ p ∈ { − } α [0, 1]: the weight of the breakpoints of subinterval p (p Hˆ ); p ∈ ∈ and new constraints:

p

xk = P0 + δp p=1 X δp (Pp Pp 1)zp p Hˇ Hˆ ≥ − − ∀ ∈ ∪ δp (Pp Pp 1)zp 1 p Hˇ Hˆ ≤ − − − ∀ ∈ ∪ δp = (Pp Pp 1)αp p Hˆ − − ∀ ∈ 44 CHAPTER 2. EXACT METHODS FOR MINLP PROBLEMS

with two dummy variables z0 := 1, zp := 0. Note that in [21, 22] the piecewise linear part of the relaxation is improved iteration by iteration by adding other breakpoints. However, we do not discuss this aspect as it is not the focus of this work. Non convex non linear restriction. At each iteration, an upper bound is obtained by fix- ing the integer variables to the value taken at the solution of the convex MINLP relaxation and then by solving the resulting non convex non linear programming problem to local optimality. If the lower bound and the upper bound coincide, then the corresponding solution is optimal for P . If this is not the case, we improve the convex MINLP relaxation by adding a breakpoint to each concave part which presents a non negligible error at the solution of the lower bound, see, for example, Figure 2.4. This provides a theoretical convergence

2.5 2.5

2 2

1.5 1.5

1 1

0.5 0.5

0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Figure 2.4: Initial convex MINLP relaxation on the right and improvement guarantee to SC-MINLP. More recently, we worked on the strengthening of the convex MINLP relaxation based on Perspective Reformulation (PR), see [27], that can be applied as the lower bounding problem is a piecewise convex problem. In particular, we use a PR of each convex piece. More formally, let us consider a convex piece p Hˇ and let us define ∆p = Pp Pp 1. ∈ − − The relevant parts of the convex MINLP relaxation are g(Pp 1 +δp) g(Pp 1) and ∆pzp − − − ≤ δp ∆pzp 1 with zp 1 0, 1 and zp 0, 1 that define the non convex function ≤ − − ∈ { } ∈ { }

0 if zp 1 = 0 and δp = 0 − g(δp, zp 1) = g(Pp 1 + δp) g(Pp 1) if zp 1 = 1 and δp ∆p −  − − − − ≤ + otherwise ∞ and its convex envelope 

0 if zp 1 = 0 and δp = 0 − δp g˜(δp, zp 1) = zp 1g(Pp 1 + ) g(Pp 1) if zp 1 (0, 1] and δp ∆p  zp 1 − − − − − − − ∈ ≤ + otherwise. ∞  2.2. ON SEPARABLE FUNCTIONS 45

We use the function g˜(δp, zp 1) to define the PR of the lower bounding problem that is − stronger than the one presented in the previous section. Preliminary computational results show that this approach outperforms the standard algorithm implemented in Bonmin and the standard linearization of convex MINLPs solved via Cplex.

2.2.2 Polyopt4 In this section we focus on solution methods for a different subclass of MINLPs, namely the box-constrained mixed-integer polynomial optimization problems:

min f(x) s.t. x [l, u] (MIPO) ∈ xi Z i I, ∈ ∀ ∈ n n α where l, u R (with l u and [l, u] := i=1[li, ui]) and f = α A cαx is an arbitrary ∈ ≤ ∈ polynomial of maximum degree d N and I 1, . . . , n . We can assume without loss ∈ Q ⊆ { } P of generality that li, ui Z for all i I. ∈ ∈ The method we propose, called PolyOpt, is a branch-and-bound method based on non n convex, separable underestimators of the form g(x) = i=1 gi(xi) (where gi(xi) are uni- ¯ ¯ 1 variate polynomials of degree at most d). We consider d = 2 2 d that is the smallest value that ensures the existence of g(x). P   We show in the following how to get such underestimators and that their minimization is very fast and provide a good trade-off between approximation quality and tractability, which is very important so solve quickly non convex MINLPs in general. First of all, we note that finding the optimal underestimator g(x) for f(x) is an in- tractable problem. Thus, we decide to simplify the underestimator construction phase by considering monomial by monomial. An underestimator g(x) can then be obtained by summing up the underestimators of each monomial. Note that g(x) is typically not the best underestimator of f(x), but computing it monomial by monomial makes the underestimator building phase efficient. Moreover, we observe that, given an underestimator of a mono- mial defined on variable bounds [-1,+1], we can efficiently compute the underestimator of the monomial with variable bounds defined in any interval. Thanks to the previous simplifications and observations, we follow the following phases to build g(x). First, we compute the optimal underestimator of all monomials of degree d ≤ with variables domain defined in [ 1, +1]. In practice, we do it for d 4 (we refer the − ≤ 4This section summarizes the contribution presented in: C. Buchheim, C. D’Ambrosio. Monomial-wise Optimal Separable Underestimators for Mixed-Integer Polynomial Optimization, Journal of Global Op- timization, 67 (4), pp. 759–786, 2017 and C. Buchheim, C. D’Ambrosio. Box-Constrained Mixed-Integer Polynomial Optimization Using Separable Underestimators, Lecture Notes in Computer Science vol. 8494, Integer Programming and Combinatorial Optimization Conference - IPCO 2014, J. Lee and J. Vygen eds., Springer International Publishing, pp. 198–209, 2014. 46 CHAPTER 2. EXACT METHODS FOR MINLP PROBLEMS

reader to [9, 10] for the related proofs). This step can be done offline and just once. Then, at each node of the branch-and-bound, we perform the following steps:

shift each variable so as to have the bounds defined symmetric with respect to 0 • scale each variable domain so as that each variable bounds are [ 1, +1] • − replace each monomial with its predefined underestimator • sum up the so obtained univariate monomials and simplify them • scale the variables back • shift the variables back by obtaining g(x). • As (MIPO) has not constraints linking more than one variable, replacing f(x) with its un- derestimator g(x) makes the problem separable, i.e., it is possible to solve n subproblems, one for each variable, to minimize g(x). To solve each subproblem, it is sufficient to find the d¯ 1 points satisfying the first order condition g0(x) = 0. This can be done analyti- − i cally for d¯ 5, numerically otherwise. The only possible minima are these points and the ≤ variable bounds. If the variables are integer, the only possible minima are the rounding up or rounding down of these points and the variables bounds. Thus, the minimization of each subproblem is extremely fast. Moreover, it is possible to obtain an upper bound of (MIPO) by re-evaluating the solution of the underestimating problem with respect to f. We developed PolyOpt, the resulting branch-and-bound, and compare it against stan- dard software for mixed integer non linear optimization such as BARON [64], Couenne [7], and SCIP [66] and one of the best performing solvers for continuous polynomial optimiza- tion, i.e., GloptiPoly [30]. PolyOpt results to outperform all the other solvers on continuous, integer, and mixed-integer instances with variables bounds equal to [ 10, 10], [ 1, 1], and − − [0, 1] the only exception being the binary case where, though, it is still competitive. Chapter 3

Heuristic Methods

In this section, we present heuristic methods. In particular, Section 3.1 is devoted to heuris- tic methods for general MINLPs such as Feasibility Pump, and Optimistic Approximation. Section 3.2 includes methods for specific classes of MINLPs, in particular, the nonlinear knapsack problem.

3.1 General methods

We present in the following two methods for general MINLP problems which are meant to provide good quality solutions in short time. The first, namely Feasibility Pump, is based on two main simpler relaxations of MINLP problems, while the second, namely Optimistic Approximation, is based on a new linearization strategy of the non linear objective function and constraints. Even though these two methods can be applied to general MINLPs, they are characterized by special, nice properties for subclasses of MINLPs.

3.1.1 Non convex Feasibility Pump1 Feasibility Pump was introduced by Fischetti et al. [26] for mixed integer linear program- ming problems with binary variables, then extended to convex MINLPs [8] and to non convex MINLPs [18, 19]. It is an iterative algorithm that, at each iteration, solves two easier-to-solve relaxations of the original problem obtained by relaxing two disjoint subsets of constraints. Thus, at each iteration k, two solutions xˆk and xˇk, one for each relaxation, are produced. The aim is to make the two sequences xˆ1,..., xˆt and xˇ1,..., xˇt converge to the same point xˆt =x ˇt with t > 0, which is, by definition of the relaxations, a feasible so- lution of the original problem. To this end, the objective functions of the two subproblems 1This section summarizes the contribution presented in: C. D’Ambrosio, A. Frangioni, L. Liberti, A. Lodi. A Storm of Feasibility Pumps for Nonconvex MINLP, Mathematical Programming, 136 (2), pp. 375–402, 2012 and C. D’Ambrosio, A. Frangioni, L. Liberti, A. Lodi. Experiments with a Feasibility Pump Approach for Non-Convex MINLPs, Lecture Notes in Computer Science vol. 6049, 9th International Symposium on Experimental Algorithms - SEA 2010, P. Festa ed., Springer-Verlag, pp. 350 - 360, 2010.

47 48 CHAPTER 3. HEURISTIC METHODS are not the original objective function f(x), but the distance with respect to the last optimal solution found for the other subproblem. If we consider a MILP [26], (1.1) can be written as

min c>x x Ax b 0 i = 1, . . . , m − ≤ ∀ x X ∈ where X includes simple bounds and integrality requirements. The two subproblems are defined as

k 1 min x xˇ − x k − k Ax b 0 i = 1, . . . , m − ≤ ∀ x Rn ∈ and min x xˆk x k − k x X ∈ where in the first subproblem the integrality requirements are relaxed while in the sec- ond subproblem the constraints (bounds and integrality requirements excluded) are re- laxed. The objective function of the first subproblem is simplified to make it linear and, 2 thus, effectively solvable, i.e., minx j N 2:ˇx =0 xj + j N 2:ˇx =1(1 xj) where N = ∈ j ∈ j − n + 1, . . . , n . As for the second subproblem, as no constraint is left besides the sim- { 1 2} P P ple bounds and the integrality requirements, solving it corresponds to a rounding phase. Thus, both subproblems are fast to solve. However, the convergence can be reached only if the solutions of the two subproblems are not explored more than once, i.e., if cycles in the sequences of solutions are avoided. Fischetti et al. proposed to avoid cycles by flipping ran- domly the value of solutions xˇ if it is already present in a “taboo list”, filled iteration after iteration. The resulting algorithm shows good performances in finding feasible solutions in short time on MILP problems from the literature. In the case of convex MINLP studied by Bonami et al. [8], the two subproblems are

k 1 min x xˇ − 2 (3.1) x k − k g (x) 0 i = 1, . . . , m (3.2) i ≤ ∀ x X (3.3) ∈ and

k min x xˆ 1 (3.4) x k − k g (x) 0 i = 1, . . . , m, k K (3.5) ik ≤ ∀ ∀ ∈ x X (3.6) ∈ 3.1. GENERAL METHODS 49

k where gik(x) is the outer-approximation linearization of gi(x) at xˆ . Note that the first subproblem is a convex NLP and the second can be written as a MILP. Set K is enriched at each iteration, thus the MILP relaxation is improved iteration after iteration. Moreover, the sequence of solutions xˇ cannot cycle. Main contribution. We further extended the FP algorithm to non convex MINLPs. We also introduced the interpretation of FP as a Successive Projection Method (SPM) for finding a point x . In fact, it was proved that this feasibility problem can be rewritten ∈ A∩B as xˆ xˇ xˆ and ˇ . This is useful if optimizing over and separately is {k − k| ∈ A § ∈ B} A B easier than optimizing over . Given an initial point xˇ0, the SPM generates a sequence k k Ak ∩ B k 1 k k of points (ˆx , xˇ ) such that xˆ = xˆ xˇ − xˆ and xˇ = xˆ xˇ xˇ . {k − k | ∈ A} {k − k | ∈ B} It is easy to see that, as discussed below, each iteration improves the objective function of the two subproblems, so that one can hope that the sequence will converge to some fixed point, which therefore provides a positive answer to the feasibility problem. The similarities of SPM with FP are clear: represents the feasible set of the first subproblem, A while represents the feasible set of the second subproblem. Two of the main differences B are the following: (i) and are typically convex sets in SPM but not in FP; (ii) the norms A B used in the objective function of the two subproblems are the same in SPM, while not in FP. This is motivated by the wish of preserving the practical tractability of the subproblems as, for example, for the outer-approximation linearization subproblem for the FP for convex MINLPs. However, in case the norms do not coincide, it is not granted that the objective functions of the two subproblems improve at each iteration, as for SPM. To cope with this issue while retaining the local convergence properties of standard SPMs we propose the introduction of appropriate norm constraints in the subproblems, i.e., adding a norm constraint to the first subproblem. These observations allow us to design many possible different variants of the FP, de- pending on how several different (orthogonal) implementation choices are taken. Different variants were tested and compared in [?] on MINLP instances from the literature. The new interpretation of FP as a SPM provides a wider framework that can be, if needed, tailored for a specific subclass of MINLPs. For example, in Section 4.6 we briefly describe how we tailored the FP algorithm to solve a challenging real-world problem, i.e., the aircraft de- confliction with speed regulation which is a crucial problem of aircraft conflict avoidance arising in air traffic management.

3.1.2 Optimistic approximation2 One of the classic method used in the past and nowadays, especially when one is inter- ested to find a feasible solution of a MINLP in a short time, is to approximate the non linear functions in the objective function and constraints of the MINLP thanks to piecewise

2This section summarizes the contribution presented in: R. Rovatti, C. D’Ambrosio, A. Lodi, S. Martello. Optimistic MILP Modeling of Non-linear Optimization Problems, European Journal of Operations Re- search, 239 (1), pp. 32–45, 2014. 50 CHAPTER 3. HEURISTIC METHODS linear approximations. The standard approximations based on the affine combination of breakpoints are known to be underestimators for concave functions and overestimators for convex functions. However, when the functions are neither convex nor concave, the ap- proximation is neither a relaxation nor a restriction, thus it does not provide lower or upper bounds. In the literature many different ways to model piecewise linear approximations were presented, see, for example, [72] for an extensive overview. All these models assume that, given a non linear function of n variables, on each piece in which the variables domain is divided the non linear function is approximated by a hyperplane defined by n+1 non de- generate breakpoints. Thus, each hyperplane is uniquely defined as an affine combination of the n + 1 breakpoints. For example, in 3D we divide the variables domain into triangles and, on each triangle, the hyperplane is uniquely defined by the 3 vertices of the triangle, evaluated at the non linear function the hyperplane approximates. Main contribution. Our approach differs because we do not assume that each hyper- plane composing the piecewise linear approximation should be uniquely defined. On the contrary, we divide each variable domain into intervals, which implies that the variables domain is divided into hyper rectangles. At each hyper rectangle of the variables domain, the non linear function is approximated as an affine combination of the vertices of the hy- per rectangle evaluated at the non linear function. Consequently, we leave a higher degree of freedom to the approximation that is represented by a thicker portion of space than hy- perplanes. Let us consider the 3D case: the portion of the space approximating the non linear function, in this case, would be the one between the two couples of hyperplanes that would be identified in the standard approach by dividing the rectangle of interest into two triangles by considering either one diagonal or the other (see Figure 3.1). The feasible set

Figure 3.1: The portion of the space approximating the non linear function lies between the two triangles in grey. is larger than the one of the standard approach, thus the objective function will drive the optimization to pick one of the points approximating the function. This is why we call this approach “the optimistic approximation”. The model resulting from this approach is a MILP, as in the standard approaches, but with a lower number of binary variables, formally, 3.2. TAILORED METHODS 51 it is of the order of the sum of the number of intervals in which each variable domain is divided instead of the order of their product. The positive impact of the reduction of binary variables is confirmed by the results of extensive computational tests. The proposed approximation own several interesting properties like continuity, a bound on the approximation error which corresponds to the one of the standard approach worse case, it provides the best possible linear interpolation of the given breakpoints in the case of minimization of a convex function and the best possible lower bound in the case of max- imization of a concave function. These properties make the approach even more interesting to approximate specific classes of MINLP problems.

3.2 Tailored methods

The heuristic methods for specific subclasses of MINLP problems exploit even more inten- sively the structure and characteristics of the problem at hand. In this following, we report an example of a challenging problem, namely the non linear knapsack problem.

3.2.1 Non linear Knapsack Problem3 Given n items and a knapsack, the non linear knapsack problem (KP) consists in deciding how many units of item j to load in the knapsack so as to maximize the profit provided by the loaded quantity of each item and to satisfy the knapsack capacity. A mathematical programming formulation can be the following

max fj(xj) (3.7) j N X∈ g (x ) c (3.8) j j ≤ j N X∈ 0 x u j N (3.9) ≤ j ≤ j ∀ ∈ x integer j N N (3.10) j ∀ ∈ ⊂ where N = 1, ..., n and x are the decision variables. The objective function represents { } j the total profit while the first constraint concerns the knapsack capacity. The second set of constraints are simple bounds on the variables (non negative quantity and upper bound u on the availability of each item j with j N) and the last set of constraints model the j ∈ fact that some items are indivisible. We assume that f(x) and g(x) are twice continuously differentiable, separable, non linear, non negative, non decreasing functions. The proposed heuristic exploits these properties of f(x) and g(x).

3This section summarizes the contribution presented in: C. D’Ambrosio, S. Martello. Heuristic algorithms for the general nonlinear separable knapsack problems, Computers and Operations Research, 38 (2), pp. 505-513, 2011. 52 CHAPTER 3. HEURISTIC METHODS

Main contribution. The main ingredient of the constructive heuristic is the discretiza- tion of the solution space. Given a positive integer s, the variables domain is divided into s intervals of equal length. At each of the resulting s + 1 discretization points, the profit and the weight is computed and the profit-to-weight ratios are computed. Taking some ideas from heuristics for the 01-KP, i.e., the knapsack problem with binary variables and linear profit and weight functions, we identify the item j and discretization point k with the best profit-to-weight ratio. The best item j is inserted in the knapsack up a quantity that is as large as possible but for which the profit-to-ratio is still better than the one of the second best item (thus, at least the quantity corresponding to discretization point k is inserted). Moreover, in a refinement step, the interval after the inserted quantity is further divided into smaller intervals and potentially a larger quantity of item j is taken. After the upper bound on item j, the knapsack capacity, and the profit-to-weight ratios are updated, the sec- ond best item is considered and so on. To improve the solution found by the constructive heuristic, a local search algorithm is deployed so as to explore, for each couple of items, a potential increase of the quantity of one and decrease of the other. Extensive computational results clearly show that the proposed methods outperform all the solvers in finding a good feasible solution in short CPU time. The heuristic algorithm was recently extended to the multiple non linear knapsack prob- lem in [51]. Also in this case, it exploits the fact that the profit and the weight functions are non decreasing and that their value at 0 is 0. Chapter 4

Other contributions

In this chapter, we briefly report some contributions in other research topics like mixed inte- ger linear programming, black-box optimization, multiobjective programming, and bilevel programming.

4.1 Feasibility issues on Hydro Unit Commitment

It is a well known fact that electricity cannot be stored easily, and that one of the most effi- cient and long-term methods for storing it is to turn it into potential energy by pumping wa- ter up mountain valleys into natural or artificial water basins. A system of interconnected valleys presents several issues when storing and re-using stored electricity. Specifically, scheduling the pumps, basins, and valleys is known as the Hydro Unit Commitment (HUC) problem. In France, hydroelectricity production represents the second source of production after nuclear power. HUC problems are usually modeled using mathematical programming and solved using a range of algorithms going from heuristic to exact. We study this crucial problem for Electricite´ de France (EDF) (see [65]). Hydro valleys are composed of inter- connected reservoirs and hydro-power plants distributed all along the valley. We simplify the problem by linearizing the non linear functions via piecewise linear approximation. Thus, the problem can be formulated as a mixed integer linear program. Besides physi- cal and classic constraints, at EDF they consider two additional specifications, namely the discrete definition of operational points of the power-flow curves and the target volumes. These “strategic” constraints can generate feasibility issues, that is the focus of our work. However, possible sources of feasibility issues are the standard issues affecting real-world data. Our contribution consists in proposing a method to analyze and detect the different sources of infeasibility so as to be able to isolate and fix them. We proceed via a step-by- step approach to identify one source of infeasibility at a time, i.e., numerical errors and model infeasibilities. The former is analyzed and fixed through tools like exact solvers and a preliminary processing on the model. The remaining infeasibilities are dealt with thanks

53 54 CHAPTER 4. OTHER CONTRIBUTIONS to a biobjective lexicographic approach. The first phase consists on solving the MILP that minimize the deviation from target volumes while making the problem feasible. In the second phase, we solve a slightly relaxed problem, i.e., the original problem but allowing a possible deviation from the target volumes as calculated in the first stage. Computational results on 66 real-world instances provided by EDF showed that the lexicographic method is able to recover feasibility in short amount of time and compute quickly a solution in the great majority of the cases.

4.2 MILP models for the selection of a small set of well- distributed points

In [23], we consider the design of experiment phase of constrained black-box optimization, an initialization step that aims at sampling an unknown function hidden behind a black- box for fitting a surrogate model of the set of selected points. Well-known measures of quality of the selection of the points are good space-filling and orthogonality properties. In particular, these are the properties that make the latin hypercube design interesting and useful in general. However, when the problem is strongly constrained, the latin hypercube design approach fails. Our method addresses this difficult case. Assuming we are given a large set of feasible points, our method selects a smaller sub- set of these points such that these properties are guaranteed. We do it by modeling the selection process as four different MILP model and show that the corresponding optimiza- tion problems are NP-hard. We tested our method and showed that our models select points that are well-distributed in the space and, on average, reduce the variance of model fitting errors. Clearly, our models might be too difficult to solve in real-world applications. How- ever, it can be an effective method to find good feasible solutions thanks to a compromise between practical complexity and meaningfulness of the good properties measure.

4.3 The well placement and design problem in reservoir engineering

In [47] and [46] we study the problem of placement and design of wells in reservoir engi- neering. The objective is to maximize the estimated profit from oil production considering a long term horizon of 10 to 50 years. The main difficulty with this problem is that the forecasted oil production is computed through time consuming simulations which makes it a black-box optimization problem. Our decision variables consist of (i) number of wells, and, for each active well, (ii) their location and type and (iii) the number of their branches and their trajectories. We propose a method in two-steps. In the first step, only vertical wells without branches are considered and decisions are taken on variables of type (i) and (ii) to maximize oil production. This is achieved thanks to a direct search method coupled 4.4. A HEURISTIC FOR MULTIOBJECTIVE CONVEX MINLPS 55 with surrogate models approximating the objective function to obtain a good estimation of the optimal number, location, and type of wells while limiting the number of simula- tions. A mixed integer Kriging model with adapted correlation functions is sequentially constructed during the direct search algorithm and is used in the exploration phase to iden- tify good candidates to be evaluated. In the second step, we fix the decisions taken in step 1 and consider only variables of type (iii). Specifically, we analyze the responses provided by the fluid flow simulator (e.g., oil saturation distribution over space and time) and model the problem of defining the geometry of the rectilinear well branches as a MINLP problem. We, then, solve it through standard MINLP solvers and potentially improve the current so- lution. We tested our approach on small instances and obtained promising results. We plan to extend our tests to realistic problems so as to validate our method.

4.4 A branch-and-bound based heuristic for multiobjec- tive convex MINLPs

In [12], we study an algorithm for finding a good approximated set of non dominated points for convex multiobjective Mixed Integer Non Linear Programming problems. The proposed heuristic is based on a branch-and-bound algorithm. In the initialization step, the problem is solved with an ε-constraint method, where a single objective model that incorporates the other objective functions as constraints is solved using different right- hand-sides for each additional constraint. In the branch-and-bound tree search, dual bound vectors are computed by solving continuous, convex NLP relaxation considering one ob- jective function at a time. At leaf nodes, the corresponding convex multiobjective NLP is solved heuristically through the classic weighted sum approach. As the use of this meth- ods does not guarantee that the solutions found are non dominated, we designed a tailored refinement procedure to remove and improve dominated points. Finally, we tested the proposed algorithm on instances of nonlinear knapsack problem and HUC, comparing it with well-known scalarization methods, showing the effectiveness of the proposed method.

4.5 An algorithm for bilevel programming with an appli- cation to smart grids

In papers [43], [71], and [58] we considered the observability problem in smart grids. The observability of a network is provided by devices like Phasor Measurement Units (PMUs) that are capable of estimating the state of the grid. More formally, given an undirected graph representing the network, the problem of PMU placement consists of finding the minimum number of PMUs to place on the edges such that the graph is fully observed. This problem is also known as Power Edge Set problem, a variant of the Power Dominating Set problem. 56 CHAPTER 4. OTHER CONTRIBUTIONS

We identified two observability propagation rules derived from Ohm’s and Kirchoff’s laws and designed two mathematical programs to model the problem. The first, a MILP model, is based on variables indexes representing the observability process iterations. The second, a bilevel model, includes the decisions on the PMU placement in the upper-level problem, while the observability propagation appears in the lower-level problem. This is possible by exploiting the fact that the observability propagation can be written as a minimal fixed-point of a specific function. We propose to solve the bilevel problem in two different ways: first, the problem can be rewritten as a single-level MILP obtained by writing in the upper-level the KKT conditions of the lower-level problem; second, a tailored cutting plane algorithm was designed to solve the bilevel problem. Computational results show that the cutting plane method is the only one that scales to large size networks.

4.6 Feasibility Pump algorithms for aircraft deconfliction with speed regulation

In [13] we tackle the problem of aircraft deconfliction with speed regulation which is a crucial problem of aircraft conflict avoidance arising in air traffic management. In this context, it is particularly important to provide fast feasible solutions to solve the critical, dangerous situation of aircraft trajectories to be too close to each other or, worse, aircraft collision. The aim of our study, founded by the ANR project “ATOMIC”, was to propose tailored heuristics to find a good quality heuristic solution in short time. The problem can be modeled as a mixed integer nonlinear optimization problem, whose solution can be very computationally demanding. For this reason, we specialized the framework of Fea- sibility Pump for non convex MINLP problems (Section 3.1.1) to the problem of aircraft deconfliction with speed regulation by identifying two complementary and easier to solve relaxations, namely a tailored non convex continuous relaxation and a tailored mixed inte- ger linear relaxation. In this version of FP, at each iteration, we solve alternatively these two subproblems, minimizing the distance between their solutions. We tested our algorithm on instances from the literature showing that, on the considered test problems, good-quality, in some cases optimal, feasible solutions are always obtained, thus improving the best results from the literature.

4.7 On the Product Knapsack Problem

In [20] we study the Product Knapsack Problem (PKP). In this knapsack problem variant, where each item is characterized by a profit and a weight, we aim at maximizing the product of the profits of the selected items. We first show that PKP is weakly NP-hard and present a Dynamic Programming algorithm adapted to this variant. Moreover, we introduce mixed integer linear and non linear programming formulations for the PKP. We test extensively and compare all the methods on a large set of instances. 4.8. LEARNING TO SOLVE HYDRO UNIT COMMITMENT PROBLEMS IN FRANCE57 4.8 Learning to Solve Hydro Unit Commitment Problems in France

We focus on the mixed integer linear programming version of the HUC problem and aim at solving it exactly in short time. To do so, we focus on the configuration of the MILP solver and devise supervised Machine Learning (ML) techniques to learn a good solver configuration. In particular, we designed a Support Vector Regression (SVR) method to automatically learn a performance model of a complex system with many parameters, in function of the numerical and structural properties of the instance being solved. After we learn the performance model thanks to the SVR method, each time we need to solve an in- stance of the HUC problem, we optimize the performance model so as to obtain the solver configuration that corresponds to the best performance. Preliminary results can be found in the master thesis [32] and, thanks to a PhD grant, we will perform a more extensive com- putational study and generalize the approach so as to learn how to solve other optimization problems. 58 CHAPTER 4. OTHER CONTRIBUTIONS Chapter 5

Conclusions and Future Research

In this manuscript, we summarize the main contributions of our research of the last 12 years. Our main research interests are theoretical and applied aspects of mixed integer non linear programming, i.e., optimization problems in which both continuous and integer decision variables are present and, in general, non linear functions define the objective to minimize and the constraints to satisfy. The strategy to tackle this kind of problems was to analyze and take advantage of specific properties that characterize (subclasses of) MINLPs. After an introduction on MINLPs and classical methods to solve them and the most widely used MINLP subclasses such as mixed integer linear programming and non linear programming, we present exact methods for special classes of MINLPs based either on the strengthening of classical relaxations or on non standard relaxations. Then, we describe heuristic methods for generic MINLPs or specific subclasses of MINLPs like the non linear knapsack and its generalizations. In the future, we plan to focus on the following methodological aspects of MINLP:

- valid inequalities for new classes of MINLP

- heuristics for new classes of MINLP

- applications as Energy Optimization and Data Science.

The first topic is particularly interesting for challenging classes of optimization prob- lems such as, for example, mixed integer non linear bilevel programming. For the EU project on “Mixed-Integer Non-Linear Optimisation: Algorithms and Applications” (MI- NOA) (https://minoa-itn.fau.de/) we plan to attack this kind of problems with cutting planes and branch and cut methods. In this context, the development of both linear and non linear valid inequalities, thus, we propose to strengthen linear relaxations and study non linear relaxations of the problems. As for the heuristic methods, there is a wide variety of problems which are very difficult to solve both in theory and in practice. Thus, sometimes exact methods cannot deal with them or, even worse, cannot find any feasible solution. Consequently, heuristic methods

59 60 CHAPTER 5. CONCLUSIONS AND FUTURE RESEARCH still play an important role in general purpose optimization solvers and in real-world appli- cations of MINLPs, for example in Data Science problems like classification and clustering. Also in this case it is often possible to identify some properties or characteristics that can be exploited or some effective algorithms for MILPs that can be extended and generalized to subclasses of MINLPs. For example, we are currently working on hybrid linearization methods for MINLPs which combines several methodologies to improve standard piece- wise linear approaches and produce efficient heuristics. Finally, we would like to continue investigating the applied optimization aspects of re- search in energy, with a particular attention to renewable energy and smart grids. We are very interested in interdisciplinary research and the TREND-X initiative of Ecole´ Polytech- nique is a perfect environment to create and develop synergies on this topic. An example of the potential results of this kind of research is the recently submitted paper on “Scenario for wind-solar energy mix in Italy from regional climate simulations” by M. Stefanon,´ P. Drobinski, J. Badosa, S. Concettini, A. Creti, C. D’Ambrosio, D. Thomopulos, P. Tankov. The other application we started being interested in more recently is Data Science and, in particular, Machine Learning. We think that the Optimization community can be very helpful to the Machine Learning community and vice-versa. We hope that the creation and consolidation of the DASCIM team (http://www.lix.polytechnique.fr/dascim/) will be a good occasion to make the two community closer than they used to be. The work mentioned in Section 4.8 is a first attempt to go in this direction. Bibliography

[1] T. Achterberg. Constraint Integer Programming. PhD thesis, Technische Universitat¨ Berlin, 2007.

[2] C.S. Adjiman, I.P. Androulakis, and C.A. Floudas. Global optimization of MINLP problems in process synthesis and design. Computers and Chemical Engineering, 21:445–450, 1997.

[3] C.S. Adjiman, I.P. Androulakis, and C.A. Floudas. Global Optimization of Mixed- Integer Nonlinear Problems. AIChE Journa, 46:1769–1797, 2000.

[4] I.P. Androulakis, C.D. Maranas, and C.A. Floudas. αbb: A global optimization method for general constrained nonconvex problems. Journal of Global Optimiza- tion, 7:337–363, 1995.

[5] E.M.L. Beale and J.A. Tomlin. Special facilities in a general mathematical program- ming system for non-convex problems using ordered sets of variables. In J. Lawrence, editor, OR 69. Proceedings of the Fifth International Conference on Operational Re- search, pages 447–454. Tavistock Publications, 1970.

[6] P. Belotti, C. Kirches, S. Leyffer, J. Linderoth, J. Luedtke, and A. Mahajan. Mixed- integer nonlinear optimization. Acta Numerica, 22:1–131, 2013.

[7] P. Belotti, J. Lee, L. Liberti, F. Margot, and A. Wachter.¨ Branching and bounds tightening techniques for non-convex MINLP. Optimization Methods and Software, 24(4):597–634, 2009.

[8] P. Bonami, G. Cornuejols,´ A. Lodi, and F. Margot. A feasibility pump for mixed integer nonlinear programs. Mathematical Programming, 119(2):331–352, 2009.

[9] C. Buchheim and C. D’Ambrosio. Box-constrained mixed-integer polynomial opti- mization using separable underestimators. In Integer Programming and Combina- torial Optimization – 17th International Conference, IPCO 2014, volume 8494 of LNCS, pages 198–209, 2014.

61 62 BIBLIOGRAPHY

[10] C. Buchheim and C. D’Ambrosio. Monomial-wise optimal separable underestima- tors for mixed-integer polynomial optimization. Journal of Global Optimization, 67(4):759–786, 2017.

[11] M.R. Bussieck and S. Vigerske. MINLP Solver Software. John Wiley & Sons, Inc., 2011.

[12] V. Cacchiani and C. D’Ambrosio. A branch-and-bound based heuristic algorithm for convex multi-objective MINLPs. European Journal of Operational Research, 260(3):920 – 933, 2017.

[13] S. Cafieri and C. D’Ambrosio. Feasibility pump for aircraft deconfliction with speed regulation. Journal of Global Optimization, to appear.

[14] A.R. Conn, K. Scheinberg, and L.N. Vicente. Introduction to derivative free Opti- mization. MPS/SIAM Book Series on Optimization, SIAM, Philadelphia, 2008.

[15] W. Cook, T. Koch, D.E. Steffy, and K. Wolter. An exact rational mixed-integer pro- gramming solver. In O. Gunl¨ uk¨ and G.J. Woeginger, editors, Integer Programming and Combinatoral Optimization, pages 104–116, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg.

[16] R.J. Dakin. A tree-search algorithm for mixed integer programming problems. The Computer Journal, 8(3):250–255, 1965.

[17] C. D’Ambrosio. Application-oriented mixed integer non-linear programming. 4OR: A Quarterly Journal of Operations Research, 8:319–322, 2010.

[18] C. D’Ambrosio, A. Frangioni, L. Liberti, and A. Lodi. Experiments with a feasibility pump approach for nonconvex MINLPs. In P. Festa, editor, Experimental Algorithms, pages 350–360, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg.

[19] C. D’Ambrosio, A. Frangioni, L. Liberti, and A. Lodi. A storm of feasibility pumps for nonconvex MINLP. Mathematical Programming B, 136(2):375–402, 2012.

[20] C. D’Ambrosio, F. Furini, M. Monaci, and E. Traversi. On the product knapsack problem. Optimization Letters, 12(4):691–712, 2018.

[21] C. D’Ambrosio, J. Lee, and A. Wachter.¨ A global-optimization algorithm for mixed- integer nonlinear programs having separable non-convexity. In Algorithms—ESA 2009, volume 5757 of LNCS, pages 107–118. Springer, Berlin, 2009.

[22] C. D’Ambrosio, J. Lee, and A. Wachter.¨ An algorithmic framework for minlp with separable non-convexity. In Jon Lee and Sven Leyffer, editors, Mixed Integer Nonlin- ear Programming, volume 154 of The IMA Volumes in Mathematics and its Applica- tions, pages 315–347. Springer New York, 2012. BIBLIOGRAPHY 63

[23] C. D’Ambrosio, G. Nannicini, and G. Sartor. MILP models for the selection of a small set of well-distributed points. Operations Research Letters, 45(1):46 – 52, 2017.

[24] D. Du and X. Hu. Steiner tree problems in computer communication networks. World Scientific Publishing Co. Pte. Ltd., Hackensack, NJ, 2008.

[25] M. Duran and I. Grossmann. An outer-approximation algorithm for a class of mixed- integer nonlinear programs. Mathematical Programming, 36:307–339, 1986.

[26] M. Fischetti, F. Glover, and A. Lodi. The feasibility pump. Mathematical Program- ming, 104:91–104, 2004.

[27] A. Frangioni and C. Gentile. Perspective cuts for a class of convex 0-1 mixed integer programs. Mathematical Programming, 106(2):225–236, 2006.

[28] I.E. Grossmann. Review of nonlinear mixed-integer and disjunctive programming techniques. Optimization and Engineering, 3:227–252, 2002.

[29] O.K. Gupta and V. Ravindran. Branch and bound experiments in convex nonlinear integer programming. Management Science, 31:1533–1546, 1985.

[30] D. Henrion, J. B. Lasserre, and J. Loefberg. Gloptipoly 3: moments, optimization and semidefinite programming. Optimization Methods and Software, 24:761–779, 2009.

[31] F.K. Hwang, D.S. Richards, and W. Winter. The Steiner tree problem, volume Ann. of Disc. Math. 53. Elsevier, Amsterdam, 1992.

[32] G. Iommazzo. Combining ML and Mathematical Optimization techniques to tackle automatic IBM ILOG CPLEX configuration on Hydro Unit Commitment problems. PhD thesis, Master’s thesis, 2017.

[33] R. Jeroslow. There cannot be any algorithm for integer programming with quadratic constraints. Operations Research, 21:221–224, 1973.

[34] P. Kesavan and P.I. Barto. Generalized branch-and-cut framework for mixed-integer nonlinear optimization problems. Computers and Chemical Engineering, 24:1361– 1366, 2000.

[35] J. Lee and Editors S. Leyffer. Mixed Integer Nonlinear Programming, The IMA Vol- umes in Mathematics and its Applications, Vol. 154. Springer, 2012.

[36] S. Leyffer. Integrating SQP and branch-and-bound for mixed integer nonlinear pro- gramming. Computational Optimization and Applications, 18:295–309, 2001.

[37] S. Leyffer and A. Mahajan. Software For Nonlinearly Constrained Optimization. John Wiley & Sons, Inc., 2011. 64 BIBLIOGRAPHY

[38] L. Liberti. Reformulation and Convex Relaxation Techniques for Global Optimiza- tion. PhD thesis, Imperial College London, UK, 2004.

[39] L. Liberti. Reformulation and convex relaxation techniques for global optimization. 4OR: A Quarterly Journal of Operations Research, 2:255–258, 2004.

[40] L. Liberti. Writing global optimization software. In L. Liberti and N. Maculan, edi- tors, Global Optimization: from Theory to Implementation, pages 211–262. Springer, Berlin, 2006.

[41] L. Liberti. Reformulations in mathematical programming: Definitions and systemat- ics. RAIRO-RO, 43(1):55–86, 2009.

[42] L. Liberti, S. Cafieri, and F. Tarissan. Reformulations in mathematical programming: A computational approach. In A. Abraham, A.E. Hassanien, and P. Siarry, editors, Foundations on Computational Intelligence, Vol. 3, Studies in Computational Intelli- gence, volume 203, pages 153–234. Springer, New York, 2009.

[43] L. Liberti, C. D’Ambrosio, P.-L. Poirion, and S. Toubaline. Measuring smart grids. In Proceedings of AIRO 2015, Pisa, Italy, pages 11–20. 2015.

[44] L. Liberti, G. Nannicini, and N. Mladenovic. A good recipe for solving MINLPs. In V. Maniezzo, T. Stutzle,¨ and S. Voss, editors, MATHEURISTICS: Hybridizing meta- heuristics and mathematical programming, volume 10 of Annals of Information Sys- tems, pages 231–244. Springer, 2009.

[45] J.T. Linderoth and A. Lodi. MILP Software. John Wiley & Sons, Inc., 2011.

[46] C. Lizon, C. D’Ambrosio, and L. Liberti. Methode´ non lineaire´ mixte en variables entires et reelles´ pour le placement et la geom´ etrie´ des puits en ingenierie´ de reservoir.´ In Proceedings of Journees´ Polyedres` et Optimisation Combinatoire (JPOC9), Le Havre, France. 2015.

[47] C. Lizon, C. D’Ambrosio, L. Liberti, M. Le Ravalec, and D. Sinoquet. A mixed- integer nonlinear optimization approach for well placement and geometry. In Pro- ceedings of ECMOR XIV - 14th European conference on the mathematics of oil re- covery, Catania, Italy, pages 1838–1848. 2014.

[48] A. Lodi. Mip computation and beyond. In M. Junger, T. Liebling, D. Naddef, W. Pul- leyblank, G. Reinelt, G. Rinaldi, and L. Wolsey, editors, 50 Years of Integer Program- ming 1958-2008. Springer Verlag, 2008.

[49] N. Maculan, P. Michelon, and A.E. Xavier. The Euclidean Steiner tree problem in Rn: A mathematical programming formulation. Ann. OR, 96(1-4):209–220, 2000. BIBLIOGRAPHY 65

[50] G.P. McCormick. Computability of global solutions to factorable nonconvex pro- grams: Part I - convex underestimating problems. Mathematical Programming, 10:147–175, 1976.

[51] L. Mencarelli, C. D’Ambrosio, A. Di Zio, and S. Martello. Heuristics for the general multiple non-linear knapsack problem. Electronic Notes in Discrete Mathematics, 55(Supplement C):69 – 72, 2016. 14th Cologne-Twente Workshop on Graphs and Combinatorial Optimization (CTW16).

[52] R. Misener and C.A. Floudas. Advances for the pooling problem: Modeling, global optimization, & computational studies. Applied and Computational Mathematics, 8:3–22, 2009.

[53] G. Nannicini and P. Belotti. Rounding-based heuristics for nonconvex minlps with binary variables. Technical report, Working paper, 2009.

[54] J. Nocedal and S.J. Wright. Numerical Optimization. Springer Series in Operations Research, 2006.

[55] I. Nowak. Relaxation and Decomposition Methods for Mixed Integer Nonlinear Pro- gramming. International Series of Numerical Mathematics, Birkhauser¨ Verlag, 2005.

[56] I. Nowak and S. Vigerske. LaGO - a (heuristic) branch and cut algorithm for non- convex MINLPs. Central European Journal of Operations Research, 16:127–138, 2008.

[57] M. Padberg, T. J. Van Roy, and L. Wolsey. Valid linear inequalities for fixed charge problems. Operations Research, 33:842–861, 1985.

[58] P.-L. Poirion, S. Toubaline, C. D’Ambrosio, and L. Liberti. The power edge set problem. Networks, 68(2):104–120, 2016.

[59]R.P orn¨ and T. Westerlund. A cutting plane method for minimizing pseudo-convex functions in the mixed-integer case. Computers and Chemical Engineering, 24:2655– 2665, 2000.

[60] I. Quesada and I.E. Grossmann. An LP/NLP based branch and bound algorithm for convex MINLP optimization problems. Computer and Chemical Engineering, 16:937–947, 1992.

[61] I. Quesada and I.E. Grossmann. Global optimization of bilinear process networks and multicomponent flows. Computers & Chemical Engineering, 19(12):1219–1242, 1995.

[62] R.Kannan and C. Monma. On the computational complexity of integer programming problems. Lecture Notes in Economics and Mathematical Systems, 157:161–17, 1978. [63] H. Ryoo and N.V. Sahinidis. A branch-and-reduce approach to global optimization. Journal of Global Optimization, 8:107–138, 1996.

[64] N.V. Sahinidis. BARON: a general purpose global optimization software package. Journal of Global Optimization, 8:201–205, 1996.

[65] Y. Sahraoui, P. Bendotti, and C. D’Ambrosio. Real-world hydro-power unit- commitment: Dealing with numerical errors and feasibility issues. Energy, to appear.

[66] SCIP. http://scip.zib.de/scip.shtml.

[67] H.D. Sherali and W.P. Adams. A Reformulation-Linearization Technique for Solv- ing Discrete and Continuous Nonconvex Problems. Kluwer Academic Publishers, Dodrecht, 1999.

[68] E.M.B. Smith and C.C. Pantelides. A symbolic reformulation/spatial branch and bound algorithm for the global optimization of nonconvex MINLPs. Computers and Chemical Engineering, 23:457–478, 1999.

[69] M. Tawarmalani and N. Sahinidis. Convexification and Global Optimization in Con- tinuous and Mixed-Integer Nonlinear Programming : Theory, Algorithms, Software, and Applications. Kluwer, Dordrecht, 2002.

[70] M. Tawarmalani and N. V. Sahinidis. Global optimization of mixed-integer nonlin- ear programs: A theoretical and computational study. Mathematical Programming, 99:563–591, 2004.

[71] S. Toubaline, P.-L. Poirion, C. D’Ambrosio, and L. Liberti. Observing the state of a smart grid using bilevel programming. In Z. Lu, D. Kim, W. Wu, W. Li, and D.-Z. Du, editors, Combinatorial Optimization and Applications, pages 364–376, Cham, 2015. Springer International Publishing.

[72] J. P. Vielma, S. Ahmed, and G. Nemhauser. Mixed-integer models for nonsepara- ble piecewise-linear optimization: Unifying framework and extensions. Operations Research, 58(2):303–315, 2010. Appendix A

Selected Publications

1. C. D’Ambrosio, A. Lodi. Mixed integer nonlinear programming tools: an updated practical overview, Annals of Operations Research, 204, pp. 301–320, 2013.

2. C. D’Ambrosio. Application-oriented Mixed Integer Non-Linear Programming. 4OR: A Quarterly Journal of Operations Research, 8 (3), pp. 319-322, 2010.

3. C. D’Ambrosio, M. Fampa, J. Lee, S. Vigerske. On a Nonconvex MINLP Formu- lation of the Euclidean Steiner Tree Problem in n-Space, Lecture Notes in Com- puter Science, vol. 9125, International Symposium on Experimental Algorithms SEA 2015, E. Bampis ed., pp. 122–133, 2015.

4. C. D’Ambrosio, J.T. Linderoth, J. Luedtke. Valid Inequalities for the Pooling Prob- lem with Binary Variables, Lecture Notes in Computer Science vol. 6655, Integer Programming and Combinatorial Optimization Conference - IPCO 2011, O. Gunluk and G.J. Woeginger eds., Springer-Verlag, pp. 117-129, 2011.

5. C. D’Ambrosio, J. Lee, A. Wachter.¨ An algorithmic framework for MINLP with separable non-convexity, J. Lee and S. Leyffer (Eds.): Mixed-Integer Nonlinear Optimization: Algorithmic Advances and Applications, The IMA Volumes in Mathematics and its Applications, Springer New York, 154, pp. 315-347, 2012.

6. C. Buchheim, C. D’Ambrosio. Monomial-wise Optimal Separable Underestimators for Mixed-Integer Polynomial Optimization, Journal of Global Optimization, 67 (4), pp. 759–786, 2017.

7. C. D’Ambrosio, A. Frangioni, L. Liberti, A. Lodi. A Storm of Feasibility Pumps for Nonconvex MINLP, Mathematical Programming, 136 (2), pp. 375–402, 2012.

8. R. Rovatti, C. D’Ambrosio, A. Lodi, S. Martello. Optimistic MILP Modeling of Non-linear Optimization Problems, European Journal of Operations Research, 239 (1), pp. 32–45, 2014.

67 9. C. D’Ambrosio, S. Martello. Heuristic algorithms for the general nonlinear separable knapsack problems, Computers and Operations Research, 38 (2), pp. 505-513, 2011. Ann Oper Res DOI 10.1007/s10479-012-1272-5

Mixed integer nonlinear programming tools: an updated practical overview

Claudia D’Ambrosio · Andrea Lodi

© Springer Science+Business Media New York 2013

Abstract We present a review of available tools for solving mixed integer nonlinear pro- gramming problems. Our aim is to give the reader a flavor of the difficulties one could face and to discuss the tools one could use to try to overcome such difficulties.

1 Introduction

Despite the fact that Jeroslow (1973) proved that mixed integer nonlinear programming (MINLP) is undecidable, in recent years there has been a renewed interest in practically solving MINLP problems. Indeed, under the often reasonable assumption of boundedness of integer variables, it is well-known that MINLP problems are -hard because they are the NP generalization of mixed integer linear programming (MILP) problems, which are -hard NP themselves. Such a renewed interest can be explained by different considerations: (i) the large improvement on the performance of solvers handling nonlinear constraints; (ii) the increased awareness that many real-world applications can be modeled as MINLP problems; and (iii) the challenging nature of this very general class of problems. Indeed, many real- world optimization problems can be cast in this interesting class of problems, such as the ones arising from portfolio optimization, design of water distribution networks, design of complex distillation systems, optimization of manufacturing, cutting-stock problems, and transportation logistics, just to mention a few. The (general) mixed integer nonlinear programming problem, which we are interested in, has the following form:

This is an updated version of the paper that appeared in 4OR, 9(4), 329Ð349 (2011). C. D’Ambrosio CNRS LIX, Ecole Polytechnique, 91128 Palaiseau, France e-mail: [email protected]

A. Lodi () DEI, University of Bologna, viale Risorgimento 2, 40136 Bologna, Italy e-mail: [email protected] Ann Oper Res

MINLP

min f(x,y) (1) g(x,y) ≤ 0(2) n x ∈ X ∩ Z (3) y ∈ Y, (4) × × where f : Rn p → R, g : Rn p → Rm,andX and Y are two polyhedra of appropriate di- mension (including bounds on the variables). We assume that f and g are twice continuously differentiable, but we do not make any further assumptions on the characteristics of these functions or their convexity/concavity. In the following, we will call problems of this type nonconvex mixed integer nonlinear programming problems. The most complex aspect one needs to have in mind when working with nonconvex MINLPs is that their continuous relaxation, i.e., the problem obtained by relaxing the in- tegrality requirement on the x variables, might have (and usually has) local optima, i.e., solutions that are optimal within a restricted part of the feasible region (neighborhood), but are not optimal with respect to the entire feasible region. This does not happen when f and g are convex (or linear): in such cases the local optima are also global optima.

Contribution of the paper The aim of this paper is to present tools for “practically” solv- ing MINLPs, i.e., finding a high quality feasible solution, neglecting the difference between local and global solutions. In particular, the focus is not on globally solving a nonconvex MINLP, i.e., unless otherwise stated, when we use the term “solving” in the following we do not intend exactly (globally). Note that this is a “reasonable” setting for real-world appli- cations, especially those involving nonconvex constraints. On the other hand, proving global optimality for nonconvex MINLPs is, with some extent, possible and we briefly overview the most commonly used methods, refereed to as Global Optimization algorithms. It is easy to see that global optimality comes at the price of (much) more computationally challenging and expensive algorithmic approaches that result to be impractical for large-scale problems. In Sect. 2 we point out some difficulties arising from handling nonlinear functions within an optimization model and how modeling languages can help to overcome some of them. We then review the solvers for special classes of MINLP: mixed integer linear programming in Sect. 3 and nonlinear programming (NLP) in Sect. 4. In these two cases, only aspects closely related to MINLP problems are presented, but references to books and papers on MILP and NLP are provided to the interested reader. On the other hand, the main focus is on MINLP methods (Sect. 5) and solvers (Sect. 6). We devote Sect. 7 to the description of “NEOS, server for optimization” NEOS. Conclusions follow in Sect. 8. The first version of this survey (D’Ambrosio and Lodi 2011) has been published in the international journal 4OR in 2011.

2 Modeling languages

Modeling languages are used to express optimization models (linear programming—LP—, NLP, MILP, MINLP) with an intuitive syntax avoiding the need for the user to implement software using programming languages. They translate a model from a form that is easy to read for the user to a form readable by the solvers. The most intuitive modeling languages are the Algebraic Modeling Languages (AML), widely used in (mixed integer) nonlinear Ann Oper Res programming. Their syntax is similar to the mathematical notation, which makes the re- sulting models easy to read for the user. It is also flexible, allowing the user to implement complicated algorithms within the modeling language framework, and compact, thanks to the use of abstract entities like sets and indices, making also large instances easy to write. Another important feature of AMLs is the nonlinear interpreter: using symbolic and/or au- tomatic differentiation, it provides the NLP solver the information about the evaluation of functions, and the first and second derivatives at a given point. This feature makes AMLs a fundamental tool for (mixed integer) nonlinear programming because the most widely used methods for solving such problems need this information. The basic functionalities of AMLs are: (a) reading the model and the data provided by the user; (b) expanding this compact problem formulation into the problem instantiation (expand the indexes, substitute the parameters, etc.); (c) providing the solver an appropriate form of the problem instantiation; (d) providing, when needed, the value of the function, its first and second derivative at a point given by the solver; and (e) reading the solution provided by the solver and translate it in a form readable by the user. Note that AMLs do not have solver capabilities and use external solvers as black-boxes, but provide standard interfaces to the most common solvers. Moreover, they usually provide effective preprocessing techniques to detect trivial infeasibility and simplify the models. The two most widely used algebraic modeling languages in MINLP are AMPL (Fourer et al. 2003)andGAMS (Brooke et al. 1992). They are commercial products, but an evaluation version of both modeling languages is available for students (allowing a maximum number of constraints and variables equal to 300).

3 Mixed integer linear programming technology

In this section we present considerations about MILP solvers, limited to aspects related to MINLP.1 Until recent years, the standard approach for handling MINLP problems (especially by scientists with a combinatorial optimization background) has basically been to solve an MILP “approximation” of it. Note that, solving an MILP approximation of an MINLP does not necessarily mean solving a relaxation of it. The optimal solution of the MILP might be neither optimal nor feasible for the original problem, if no assumption is made on the MINLP structure. Standard techniques to approximate special classes of MINLP problems with an MILP model are, for example, piecewise linear approximation (when nonlinear functions are uni- variate) and, in this case, Special Ordered Sets (SOS) of type 2 can be defined to model the convex combinations that form the piecewise linear approximation. By defining a set of variables to be a special ordered set of type 2 (SOS2), one imposes that at most 2 such variables can take a nonzero value, and that they must be adjacent (see, e.g., Beale and Tom- lin 1970). In some of the available MILP solvers, the definition of special ordered sets of types 1 and 2 is recognized and these sets of variables are treated through special branching rules. However, it is possible to use any kind of MILP solver and to define a piecewise linear function by adding suitable binary variables. In presence of special properties of the approximated functions, (even) a standard piece- wise linear approximation might result in a relaxation of the original problem. However,

1Note that we do not consider MINLPs that can be exactly reformulated as MILP problems. Ann Oper Res this is in general not the case. Alternatively, methods tailored for solving MINLPs can be employed, which consist in solving MILP relaxations of the MINLP problem. Algorithms in this category are, for example, outer approximation, generalized Benders decomposition and extended cutting plane (see Sect. 5 and Grossmann 2002). In these cases an MILP is obtained starting from the original MINLP, but the approximation is constructed on purpose to provide a relaxation, and it is tightened runtime. From a software viewpoint, another important role MILP plays in the solution of MINLP problems is that some MINLP techniques use MILP solvers as black-boxes to address care- fully defined MILP subproblems. These techniques are described as well in Sect. 5. We conclude this section by noting that in the MILP case there are at least a couple of standard file formats widely used by the MILP solvers, namely, MPS and LP files. No fully accepted file format is available for models in which nonlinear functions are involved, although the AMPL file format has recently become quite standard. We will further discuss this issue in Sects. 4 and 6. Finally, examples of MILP solvers are the commercial ones GUROBI, (GUROBI), IBM- CPLEX, (IBM-CPLEX), XPRESSOptimizer, (XPRESS), and the open-source (or par- tially open-source) solvers CBC (CBC) and SCIP (SCIP). For a survey of the development of methods and software for MILPs the reader is referred, e.g., to Lodi (2009), and, for detailed discussions, to Achterberg (2007) and Linderoth and Lodi (2011).

4 Nonlinear programming technology

Nonlinear programming subproblems play a fundamental role in the solution of MINLPs. Different approaches to the solution of NLPs are possible. Thus, different types of solvers are available, both commercial and open-source. The most commonly used and effective NLP solvers need the first and usually second derivatives of the nonlinear functions involved in the model. Thus, the nonlinear functions are assumed to be smooth. The necessity of derivatives is also the cause of the lack of a fully accepted file format to express NLP/MINLP problems. Providing the first and, possibly, the second derivative at a given point is not straightforward for non-expert users and can easily be a source of errors. The use of modeling languages helps to partially overcome such difficulties. However, the information about derivatives sometimes cannot be computed. In this case, derivative free optimization methods can be applied. For details on this topic, the reader is referred to the book by Conn et al. (2008). An additional aspect that NLP solvers have in common is that they only guarantee that the solution provided is a local optimum. Thus, when we use standard NLP solvers, we might not obtain the globally optimal solution. This is why some MINLP solvers are ex- act algorithms for convex MINLPs (where local optima are also global optima) but only heuristics for nonconvex MINLPs. We will discuss this issue in detail in Sect. 5. An alternative would be solving NLP relaxations of (nonconvex) MINLPs to optimality by means of global optimization algorithms. Of course that would be at the cost of the solver’s efficiency, and, for this reason, global optimization techniques are usually not used in MINLP solvers for handling the NLP subproblems. Any serious discussion about NLP algorithms would lead us outside the scope of the present paper. We refer the reader to Nocedal and Wright (2006), where, at the end of each chapter describing an NLP method, a discussion about software is also provided. Examples of the most widely used solvers are CONOPT, FILTERSQP, IPOPT, KNITRO, MINOS, MOSEK, NPSOL, SNOPT. For an overview of the NLP software, the reader is referred to Leyffer and Mahajan (2011). Ann Oper Res

5 Mixed integer nonlinear programming

In this section we first describe the most commonly used algorithms for solving convex MINLP problems, and then introduce some specific considerations on nonconvex MINLPs. Although, as mentioned in the Introduction, the focus of the paper is on tools proving local solutions for MINLPs, nevertheless if a problem is recognized to have a convex NLP relax- ation, then special methods can be exploited and, in addition, these methods provide local solutions for nonconvex MINLPs as well. We do not describe special-purpose methods, such as, for example, those for Semi- Definite programming, or mixed integer Quadratic programming, but only general-purpose MINLP techniques. The interested reader is referred to (among many possible references) the recent MINLP Software survey in the Wiley Encyclopedia of Operations Research and Management Science (Bussieck and Vigerske 2011). For a collection of some of the most exciting latest advances on MINLP, we refer the reader to the recent book (Lee and Leyffer 2012).

5.1 Convex MINLP

Algorithms studied for convex MINLPs basically differ on the way subproblems, namely MILPs and NLPs, are defined and used. Unless otherwise stated, we assume that those sub- problems are solved to optimality, which is reasonable in this setting for NLPs because of the convexity assumption. In the following, we briefly present some of the most commonly used algorithms and refer the reader to the paper by Grossmann (2002) for an exhaustive discussion. At the end of the section, we summarize the different characteristics of the pre- sented algorithms in Table 1. Ð Branch-and-Bound (BB): Originally proposed for MILP problems (see Land and Doig 1960),itwasfirstappliedtoconvexMINLPsbyDakin(1965) and Gupta and Ravindran (1985). The first step solves the continuous relaxation of the MINLP, NLP0. Then, given ∗ ∗ ∗ a fractional value xj in the solution of the continuous relaxation (x ,y ), the problem ≤ ∗ is divided into two subproblems, by alternatively adding the constraint xj xj or the ≥ ∗+ constraint xj xj 1. Each of these new constraints represents a “branching decision” (imposed by simply changing the bounds on the variables) because the partition of the problem into subproblems is represented with a tree structure, the BB tree. The process is iterated for each node, until the solution of the continuous relaxation of the subproblem is integer feasible, or is infeasible, or the lower bound value associated with the subproblem is not smaller than the value of the current incumbent solution, i.e., the best feasible solution encountered so far. In these three cases, the node is fathomed. The algorithm stops when no node is left to explore, returning the incumbent solution, which is proven to be optimal. The basic difference between the BB for MILPs and for MINLPs is that, at each node, the subproblem solved is an LP in the former case, and an NLP in the latter. We do not discuss specific approaches for branching variable selection, tree exploration strategy, etc. The reader is referred to Abhishek (2008), Bonami et al. (2008), Gupta and Ravindran (1985), Leyffer (1999) for details. Ð Outer-Approximation (OA): Proposed by Duran and Grossman (1986), it exploits the outer approximation linearization technique. Namely, given a point (x∗,y∗), we can de- rive the OA cut       ∗ ∗ ∗ ∗ ∗ T x − x g x ,y +∇g x ,y ≤ 0. j j y − y∗ Ann Oper Res

Note that, by convexity, the OA cut is “safe”, i.e., it does not cut off any solution of the MINLP. OA is an iterative method in which, at each iteration k, two subproblems are solved: • k k−12 NLPx , an NLP defined by fixing the integer variables to the values x ;and • MILPk , an MILP relaxation of the original MINLP obtained by the OA cuts generated k ≤ so far from the solutions of NLPx with k k. The former subproblem, if feasible, gives an upper bound on the solution of the MINLP, k while the latter always provides a lower bound. If the NLPx subproblem is not feasi- ble, a “feasibility NLP subproblem” is solved and its optimal solution (x∗,y∗) is used to derive the new OA cuts for MILPk . The feasibility NLP subproblem is defined as min{u | g(xk−1,y)≤ u, y ∈ Y,u ≥ 0} where u is an additional variable that represents the maximum constraints violation, which is minimized. At each iteration lower and upper k bounds might be improved. In particular, the definition of NLPx changes because the x vector changes. Moreover, the definition of MILPk changes because of the addition, at each iteration, of OA cuts, which make the solution of the previous iteration infeasible. The algorithm ends when the gap between the two bounds is within a fixed tolerance. Ð Generalized Benders Decomposition (GBD): Proposed by Geoffrion (1972), this is the extension to MINLP of the famous work of Benders (1962) for MILP. The basic idea of this iterative method is to decompose, at each iteration, the MINLP into: (i) a simpler NLP subproblem obtained by fixing the x variables; and (ii) a (MILP) master problem that is the relaxation of the projection of the MINLP on the x-space. In the spirit of OA cuts, if the NLP is feasible, a so-called optimality cut (obtained projecting on the x-space the weighted combination of the linearizations of objective function and active constraints) is added to the master problem. Otherwise, a so-called feasibility cut (obtained projecting on the x-space the weighted combination of the linearizations of active constraints) is added to it. This method is strongly related to the OA method, the unique difference being the form of the MILPk subproblem. The MILPk of the GBD method is a surrogate relaxation of the one of the OA method. Thus, the lower bound given by the OA is stronger than (i.e., greater than or equal to) the one given by the GBD (for details, see Duran and Grossmann 1986). Even if, on average, the GBD method requires more iterations than the OA method, the fact that MILPk subproblems of the former are more “tractable” (smaller) can make sometimes convenient using GBD instead of OA. Ð Extended Cutting Plane (ECP): Introduced by Westerlund and Pettersson (1995), the method is based on the iterative resolution of an MILPk subproblem and, when its solution (xk,yk) is infeasible for the MINLP, on the determination of the most violated constraint k ∗ k k (or constraints) J ={j ∈ arg{max{gj (x ,y ), ∀j = 1,...,m}}}. The linearization of such a constraint is added to the next MILPk . The produced lower bound increases at each iteration, but generally a large number of iterations is needed to reach the optimal solution. Obviously, whenever (xk,yk) becomes feasible for MINLP, such solution is optimal and the algorithm terminates. An extension to pseudo-convex MINLP problems3 has been presented. Ð LP/NLP based Branch-and-Bound (QG): Introduced by Quesada and Grossmann (1992), the method can be seen as the extension of the Branch-and-Cut approach to con- vex MINLPs. The idea is to use BB to solve the MILPk obtained from the solution (x0,y0)

2If k = 1, no fixing is performed. 3A pseudo-convex MINLP is an MINLP involving pseudo-convex functions. Informally, a function is pseudo- convex if behaves like a convex function with respect to finding its local minima, but need not actually be convex, see, e.g., Mangasarian (1965). Ann Oper Res

Table 1 Number of MILP and NLP subproblems solved by each algorithm of Sect. 5.1.

#MILP #NLP Note

BB 0 # nodes OA # iterations # iterations GBD # iterations # iterations weaker lower bound w.r.t. OA ECP # iterations 0 QG 1 1 + # explored MILP solutions Hyb (Abhishek et al. 2010) 1 1 + # explored MILP solutions stronger lower bound w.r.t. QG Hyb (Bonami et al. 2008) 1 [# explored MILP solutions,# nodes]

of the NLP relaxation. Whenever an integer feasible solution x of MILPk is found, an k k k k k NLPx subproblem is solved obtaining (x ,y ). OA cuts generated from (x ,y ) are added to all the open nodes of the current BB tree. QG algorithm differs from ECP because at some nodes NLP subproblems are solved, from BB because they are not solved at each node. Ð Hybrid algorithms (Hyb): Proposed by Bonami et al. (2008) and Abhishek et al. (2010), they are enhanced versions of the QG algorithm. They are called hybrid because they combine BB and OA methods. The version proposed by Bonami et al. (2008) differs from the QG algorithm because: k (i) at “some” nodes (not only when an integer solution is found like in QG) the NLPx subproblem is solved to generate OA cuts (like in BB); and (ii) local search is performed at some nodes of the tree by partial enumeration of the associated MILP relaxations. This is done to improve the bounds and collect OA cuts early in the tree. If the local k enumeration is not limited, the algorithm reduces to OA, while when the NLPx is solved at each node, it reduces to BB. The version proposed by Abhishek et al. (2010) differs from the one proposed by Bonami et al. (2008) in the way linearizations are added and how these linearizations are managed. In particular, NLP subproblems are solved only when an integer MILP feasible solution is found (like in the original QG algorithm), but linearization cuts are added runtime also at other nodes (with fractional x’s). When no linearization cut is added at any other nodes, the algorithm reduces to QG, while when linearization cuts are added at each node, it reduces to ECP.

As anticipated, Table 1 summarizes the differences in the way the algorithms presented above address convex MINLPs. Specifically, the table gives an estimation of the number of MILP and NLP subproblems solved by each method, and summarizes some of the discussed relationships between pair of algorithms. We end the section by briefly mentioning that an important role in the practical effec- tiveness of the discussed algorithms is played by (primal) heuristics, i.e., algorithms aimed at improving the incumbent solution (upper bound) within the various phases of the enu- merative schemes. Part of the heuristic algorithms studied for MILPs have been adapted to convex MINLP problems. For details about primal heuristics the reader is referred, e.g., to Abhishek (2008), Bonami et al. (2009), Bonami and Gonçalves (2008). Ann Oper Res

Fig. 1 Example of “unsafe” linearization cut generated from a nonconvex constraint

5.2 Nonconvex MINLP

Coming back to the first model seen in this paper, MINLP, if we do not make any convexity assumption on the objective function and the constraints, then one of the main issues is that the NLP problems obtained by dropping the integrality requirements have, in general, local minima which are not global. This implies, that, if the NLP solver used to solve the NLP subproblems does not guarantee that the solution provided is a global optimum (and this is usually the case for the main NLP solvers, e.g., those discussed in Sect. 4), feasible and even optimal solutions might be fathomed, for example, by comparison with an invalid lower bound. As a result, methods like BB, QG, and Hyb that are exact for convex MINLPs work as heuristics for nonconvex MINLPs. A second issue involves methods like OA, GBD, ECP, QG, and Hyb: the linearization cuts used by these algorithms are in general not valid for nonconvex constraints. It follows that the linearization cuts might cut off not only infeasible points, but also parts of the feasible region as depicted in Fig. 1. Adding the cut represented by the dashed line in the figure would result in cutting off the upper part of the nonconvex feasible region, i.e., that above the cut. For this reason, if nonconvex constraints are involved, one has to carefully use linearization cuts or accept that the final solution might be not the optimal one. As for the methods for specifically handle nonconvexities, the first approach is to fully reformulate the problem. The exact reformulation is possible only in limited cases of non- convex MINLPs and allows one to obtain an equivalent convex formulation of the noncon- vex MINLP in the sense that: (i) all feasible pairs (x, y) ∈ Zn × Rp are mapped to unique and feasible points of the feasible region of the convex formulation; and (ii) no other mixed integer (infeasible) solution (x, y) ∈ Zn × Rp belongs to it. All the techniques described in Sect. 5.1 can then be (safely) applied to the reformulated MINLP. For a detailed description of exact reformulations to standard forms, see, for example, Liberti et al. (2009a). The second approach, which applies to a larger subset of nonconvex MINLPs, is based on the use of convex envelopes or underestimators of the nonconvex feasible region. The relaxation of the original problem has the form: rMINLP min z (5) f (x, y) ≤ z (6) g(x, y) ≤ 0(7) n x ∈ X ∩ Z (8) y ∈ Y, (9) Ann Oper Res

Fig. 2 Linear underestimators before and after branching on continuous variables where z is an auxiliary variable introduced as a standard modeling trick to have a linear × × objective function, see (5)and(6). Moreover, f : Rn p → R and g : Rn p → Rm are convex (in some cases, linear) functions such that f (x, y) ≤ f(x,y) and g(x, y) ≤ g(x,y) within the (x, y) feasible space. Roughly speaking, the first step to reformulate the nonconvex problem MINLP to a “tractable” form like the mathematical programming problem (5)Ð(9) is to associate a convex relaxation with each part of the original model. This is obtained by adding aux- iliary variables that “linearize” the nonlinearities of the original constraints. This way, complex nonlinearities are subdivided into simpler nonlinearities, at the cost of addi- tional variables and constraints. The basic nonlinearities are then relaxed using simple en- velopes/underestimators. This type of reformulation by pieces only applies to factorable functions, i.e., functions that can be expressed as summations and products of univariate functions after appropriate recursive splitting. More precisely, a function f is factorable if it can be rewritten as:   f = fhk(x), h k where fhk(x) are univariate functions for all h, k. These functions can be reduced and refor- mulated into simpler nonlinear functions, adding auxiliary variables and constraints such as = = = shk fhk(x), sh k shk , sk h sh. This modeling trick allows reformulating the model through predetermined operators for which convex underestimators are known, such as, for example, bilinear, trilinear, or fractional terms (see, e.g., Liberti 2006; McCormick 1976; Smith and Pantelides 1999). The use of underestimators makes the feasible region larger: If the optimal solution of rMINLP is feasible for the nonconvex MINLP, then it is also its global optimum. Other- wise, a refinement of the underestimation is needed. This is done by branching, not only on integer variables but also on continuous ones as depicted in Fig. 2. This technique al- lows one to have a lower bound on the nonconvex MINLP optimum that can be used within an algorithm, like branch-and-bound specialized versions for global optimization, namely spatial branch-and-bound (Smith and Pantelides 1999;Leyffer2001; Liberti 2004a; Belotti et al. 2009), branch-and-reduce (Ryoo and Sahinidis 1996; Sahinidis 1996;Tawar- malani and Sahinidis 2004), α-BB (Androulakis et al. 1995; Adjiman et al. 1997), Branch- and-Cut (Kesavan and Barto 2000; Nowak and Vigerske 2008). Such algorithms provide a solution that is guaranteed to be a global optimum. The above specialized branch-and-bound methods for global optimization mainly differ on the branching scheme adopted, namely, Ann Oper Res they might either concentrate first on discrete variables (until an integer feasible solution is found), then branch on continuous variables, or branch on both variable types without a prefixed priority. From the algorithmic viewpoint, one could also branch on continuous variables appearing in nonconvex terms first (see, e.g., Adjiman et al. 2000 for details), but, this option is not available in the solvers, at least those we consider in Sect. 6. It is clear that algorithms of this type are very time-consuming in general and can be employed only for small- to medium-size instances. This is the price one has to pay for the guarantee of the global optimality of the solution provided (of course, within a fixed tolerance). From an implementation viewpoint, some complex structures are needed for implement- ing global optimization algorithms, such as the so-called symbolic expression trees (see, Smith and Pantelides 1999 for their use in (global) optimization). For a detailed description of the issues arising in developing and implementing a global optimization algorithm the reader is referred to Liberti (2006). The documentation of the specific solvers is also very useful. For an overview of the way to define functions f (x, y) and g(x, y) for nonconvex functions f and g with specific structure the reader is referred to Liberti (2004a, 2004b), McCormick (1976), Nowak (2005). Finally, it is worth mentioning that some of the primal heuristic algorithms developed for convex MINLPs have been generalized to the nonconvex setting by specifically taking into account the nonconvexity issues described above. Actu- ally, this is a very active research topic on which several contributions are on the way, see, e.g., Liberti et al. (2009b), D’Ambrosio et al. (2010), D’Ambrosio (2010), Nannicini and Belotti (2011), to mention a few.

6 Mixed integer nonlinear programming solvers

This section is devoted to MINLP solvers. For each solver, we give a brief summary of its characteristics (often taken from manuals, web pages, and experience) and a list containing the following information: Ð Implemented algorithms: the algorithm implemented within the solver. Examples of such algorithms are branch-and-bound (either standard or spatial), outer approximation, branch-and-cut (see Sect. 5). If present, special tools, aimed at speeding-up the solver, are mentioned, such as, e.g., bound reduction techniques. Ð Type of algorithm: type of solution (exact/heuristic) and assumptions on the MINLP problem (for example, convexity of the MINLP). Recall that exact algorithms for convex MINLPs can be used as (generally very inaccurate) heuristics for the nonconvex case. We will point out if the solver has specific options to mitigate the inaccuracy due to the nonconvexities. Ð Available interface: how the user can interface with the solver. The most common pos- sibilities are: (i) the solver is available as stand-alone application, which receives the description of the model as a file; (ii) the user can provide information about the MINLP problem to be solved by implementing some routines linked to the solver libraries; and (iii) the user can write the model under a modeling language environment which is di- rectly linked to the solver and call the solver from the environment. Clearly, the lack of a fully accepted file format for NLPs impacts MINLPs as well. Some of the difficulties in expressing nonlinear functions are partially overcome by mod- eling languages (see Sect. 2). Moreover, a useful tool for the conversion of files within the most common file formats can be found at http://www.gamsworld.org/performance/ paver/convert_submit.htm. Ann Oper Res

Ð Open-source software: “yes” if the software is open-source, i.e., the source code is avail- able and can be modified by the user. Ð Programming language: the programming language used to implement the software. This information is only given if the solver is open-source. Ð Authors: the authors of the software, and the main published reference, if any. Ð Web site: the web site of the solver where information and user manual can be found. Ð Dependencies: the external software which the solver depends on if it uses (parts of) other software to exploit capabilities, such as, e.g., interfacing to specific environments, solving special classes of MINLPs (like NLP/MILP subproblems) or generating cuts. A notation like XX/YY indicates that the considered software relies either on software XX or on software YY, to solve a special class of subproblems. As extensively discussed in Sect. 5, methods for MINLP are usually based on decompo- sition/relaxation of the problem into simpler subproblems, which are typically NLPs and MILPs. For this reason, fundamental ingredients of MINLP solvers are often reliable and fast NLP and MILP solvers. The improvements on such solvers highly influence general- purpose software for MINLPs, and the recent success of MINLP solvers in solving real- world applications is also due to them. The kind of problems MINLP solvers are able to handle depends, in turn, on the NLP and MILP solvers integrated as part of the solving phase. From a development viewpoint, the use of available NLP and MILP solvers makes the work of a developer easier, and allows one to concentrate on aspects and issues typical of MINLP problems, exploiting more stable techniques for NLP and MILP problems (see, e.g., Bonami et al. 2007). Finally, the order in which the solvers are presented is alphabetic.

6.1 ALPHA-ECP

ALPHA-ECP is an MINLP solver based on the extended cutting plane method. The ECP method is an extension of Kelley’s (1960) cutting plane algorithm which was originally given for convex NLP problems. In Westerlund and Pettersson (1995) the method was ex- tended to convex MINLP problems, and in Westerlund et al. (1998) to MINLP problems with f and g pseudo-convex functions. Pseudo-convex functions are those that satisfy the following property: ∇h(x∗)(x − x∗)<0ifh(x)

6.2 BARON

BARON (Branch-and-Reduce Optimization Navigator) is a computational system for the solution of nonconvex optimization problems to global optimality. The solver combines branch-and-bound techniques specialized for nonconvex MINLPs with bound reduction based on both feasibility and optimality reasoning, which improve its performance. It is currently one of the most effective solvers in global optimization. In summary: Ð Implemented algorithm: branch-and-reduce. Ð Type of algorithm: exact. Ð Available interface: AIMMS and GAMS. Ð Open-source software: no. Ð Authors: M. Tawarmalani and N.V. Sahinidis (Ref. Sahinidis 1996). Ð Web site: http://archimedes.cheme.cmu.edu/baron/baron.html. Ð Dependencies: IBM-CPLEX, either MINOS or SNOPT.

6.3 BONMIN

BONMIN (Basic Open-source Nonlinear Mixed INteger programming) is an open-source code for solving MINLP problems. It is distributed within COIN-OR (http://www.coin-or. org) under Common Public License (CPL). There are several algorithmic options that can be selected within BONMIN. B-BB is an NLP-based branch-and-bound algorithm, B-OA is an outer-approximation decomposition algorithm, B-QG is an implementation of Quesada and Grossmann’s branch-and-cut algorithm, and B-Hyb is a hybrid outer-approximation based branch-and-cut algorithm. Some of the algorithmic choices require the ability to solve MILP and NLP problems. The default solvers for them are the COIN-OR codes CBC and IPOPT, respectively. The four algorithms above are exact for convex MINLPs. To solve (heuristically) a problem with nonconvex constraints, one should only use the Branch-and-Bound algorithm B-BB. Some options have been specifically designed in BON- MIN to treat problems whose continuous relaxation is not convex. Specifically, in the context of nonconvex problems, the NLP solver may find different local optima when started from different starting points. Two options allow for solving the root node or each node of the tree with a user-specified number of different randomly-chosen starting points, saving the best solution found. The function generating a starting point chooses a random point (uniformly) between the variable bounds. Because the solution given by the NLP solver does not give a valid lower bound, another option of BONMIN allows the user to change the fathoming rule to continue branching even when the solution value of the current node is worse than the best-known solution. In summary: Ð Implemented algorithms: Branch-and-Bound, Outer-Approximation, LP/NLP based Branch-and-Bound, Hybrid. Ð Type of algorithm: exact for convex MINLPs, options for nonconvex MINLPs, see above. Ð Available interface: AMPL, GAMS, stand-alone application (reading NL files), invocation through C/C++ program, MATLAB interface through OPTIToolBox. Ð Open-source software: yes. Ð Programming language: C++. Ð Authors: P. Bonami et al. (see the web page) (2008). Ð Web site: https://projects.coin-or.org/Bonmin. Ð Dependencies: CBC/IBM-CPLEX, IPOPT/FILTERSQP (beside Blas, HSL/MUMPS, optionally ASL, Lapack). Ann Oper Res

6.4 COUENNE

COUENNE (Convex Over and Under ENvelopes for Nonlinear Estimation) is a spatial branch-and-bound algorithm to solve MINLP problems. COUENNE aims at finding global optima of nonconvex MINLPs. It implements linearization, bound reduction, and branching methods within a branch-and-bound framework. Its main components are: (i) an “expression library”, i.e., a library of predetermined operators that can be used to reformulate general nonlinear functions; (ii) separation of linearization cuts; (iii) branching rules; and (iv) bound tightening methods. It is distributed within COIN-OR under CPL. In summary: Ð Implemented algorithms: spatial branch-and-bound, bound reduction. Ð Type of algorithm: exact. Ð Available interface: AMPL, GAMS, stand-alone application (reading NL files), invocation through C/C++ program. Ð Open-source software: yes. Ð Programming language: C++. Ð Authors: P. Belotti et al. (see the web page) (2009). Ð Web site: https://projects.coin-or.org/Couenne. Ð Dependencies: CBC, IPOPT (beside Blas, HSL/MUMPS, optionally ASL, Lapack).

6.5 DICOPT

DICOPT (DIscrete and Continuous OPTimizer) is a software for solving MINLP problems where nonlinearities involve only continuous variables. Binary and integer variables appear only in linear functions. The implementation is based on outer approximation, equality re- laxation and augmented penalty, i.e., equality constraints are relaxed obtaining inequalities and adding a penalization factor in the objective function. The MINLP algorithm inside DICOPT solves a series of NLP and MILP subproblems. These subproblems can be solved using any NLP or MILP solver that runs under GAMS. The solver can handle nonconvexities, but in such case it does not necessarily obtain the global optimum. According to the web- page, the GAMS/DICOPT software has been designed with two main goals in mind: (i) to build on existing modeling concepts and to introduce a minimum of extensions to the ex- isting modeling language; and (ii) to use existing optimizers to solve the subproblems. This concept favors an effective exploitation of any new development and enhancement in the NLP and MILP solvers. In summary: Ð Implemented algorithms: outer approximation, equality relaxation, augmented penalty. Ð Type of algorithm: exact for convex MINLPs with binary and integer variables appearing only in linear constraints. Ð Available interface: GAMS. Ð Open-source software: no. Ð Authors: J. Viswanathan and I.E. Grossmann (1989). Ð Web site: http://www.gams.com/dd/docs/solvers/dicopt.pdf. Ð Dependencies: GAMS, IBM-CPLEX/XPRESS/XA, CONOPT/MINOS/SNOPT.

6.6 FILMINT

FILMINT is a solver for convex MINLPs. It is based on the LP/NLP branch-and-bound algorithm of Quesada and Grossmann (1992). FILMINT combines the MINTO branch-and- cut framework (Nemhauser et al. 1994) for MILP with FILTERSQP used to solve the non- linear programs that arise as subproblems in the algorithm. The MINTO framework allows Ann Oper Res one to employ cutting planes, primal heuristics, and other well-known MILP enhancements to the MINLP context. FILMINT offers techniques for generating and managing lineariza- tions for a wide range of MINLPs. In summary: Ð Implemented algorithm: LP/NLP based branch-and-bound. Ð Type of algorithm: exact for convex MINLPs. Ð Available interface: AMPL, stand-alone application (reading NL files). Ð Open-source software: no. Ð Authors: K. Abhishek, S. Leyffer, and J. Linderoth (2010). Ð Web site: http://www.mcs.anl.gov/~leyffer/papers/fm.pdf. Ð Dependencies: MINTO, FILTERSQP (beside ASL).

6.7 LAGO

LAGO (LAgrangian Global Optimizer) is a Branch-and-Cut algorithm to solve MINLPs. A linear outer approximation is constructed from a convex relaxation of the problem. Be- cause LAGO does not have an algebraic representation of the problem, reformulation tech- niques for the construction of the convex relaxation cannot be applied, and sampling tech- niques are used in case of non-quadratic nonconvex functions. The linear relaxation is fur- ther improved by general purpose cutting planes. Bound reduction techniques are applied to improve efficiency. The algorithm assumes the existence of procedures for evaluating func- tion values, gradients, and Hessians of the functions. According to the webpage, by relying on such black-box functions LAGO has the advantage of handling very general functions, but has the disadvantage that advanced reformulations and bound reduction techniques cannot be used. Hence, when sampling methods are applied, no deterministic global optimization is guaranteed. LAGO is distributed within COIN-OR under CPL.4 In summary: Ð Implemented algorithms: branch-and-cut (reformulation, relaxation, linear cut generator). Ð Type of algorithm: exact for mixed integer quadratically constrained quadratic program- ming problems and convex MINLPs. Ð Available interface: AMPL, GAMS, stand-alone application (reading NL files), invocation through C/C++ program. Ð Open-source software: yes. Ð Programming language: C++. Ð Authors: I. Nowak and S. Vigerske (2008). Ð Web site: https://projects.coin-or.org/LaGO. Ð Dependencies: IPOPT, CGL, CLP, optionally IBM-CPLEX (beside ASL/GAMSIO, METIS, ranlib, TNT, optionally Lapack).

6.8 LINGO and LINDOGlobal

LINGO is an optimization suite that handles virtually any mathematical programming prob- lem including MINLP. It implements branch-and-cut methods for global optimization but, in case a global solution is computationally too costly (or not ‘needed’), LINGO solution method can be turned into a heuristic that uses special features like multistart techniques for solving NLPs. The LINGO package also provides an internal modeling language and Spreadsheet/Database capabilities. LINDOGlobal is the GAMS version of the suite, which comes with a model size limitation of 3,000 variables and 2,000 constraints. In summary:

4The development of LAGO is currently ceased but the software is open-source, so we prefer to keep it into the list because future development is possible. Ann Oper Res

Ð Implemented algorithm: branch-and-cut. Ð Type of algorithm: exact. Ð Available interface: GAMS. Ð Open-source software: no. Ð Authors: LINDO Systems. Ð Web site: http://www.lindo.com/,and http://www.gams.com/solvers/solvers.htm#LINDOGLOBAL. Ð Dependencies (for LINDOGlobal): GAMS, CONOPT, optionally MOSEK.

6.9 MINLPBB

MINLPBB is a package of FORTRAN 77 subroutines for finding solutions to MINLP prob- lems. The package implements the branch-and-bound method in a nonlinear setting. The package must be used in conjunction with FILTERSQP and BQPD (Nocedal and Wright 2006) for solving NLPs. Problems are specified in the same way as for FILTERSQP (i.e., either via subroutines or via AMPL or CUTE Bongartz et al. 1995). The additional integer structure is specified using a vector to identify the indices of integer variables. The user can influence the choice of the branching variable by providing priorities. In summary: Ð Implemented algorithm: branch-and-bound. Ð Type of algorithm: exact for convex MINLPs. Ð Available interface: AMPL, CUTE, stand-alone application (reading NL files), MATLAB through TOMLAB. Ð Open-source software: no. Ð Authors: S. Leyffer. Ð Web site: http://www-unix.mcs.anl.gov/~leyffer/solvers.html. Ð Dependencies: FILTERSQP, BQPD (beside CUTE/ASL).

6.10 MINOPT

MINOPT is a package that comprehends solvers for virtually any mathematical program- ming problem including MINLP. Specifically, MINOPT provides implementations of GBD and variants of OA. In addition, the MINOPT algorithmic framework provides a modeling language and has connections to a number of solvers. In summary: Ð Implemented algorithms: generalized Benders decomposition, outer approximation, gen- eralized cross decomposition. Ð Type of algorithm: exact. Ð Available interface: stand-alone application, MINOPT modeling language. Ð Open-source software: no. Ð Authors: C.A. Schweiger and C.A. Floudas (1998a, 1998b). Ð Web site: http://titan.princeton.edu/MINOPT. Ð Dependencies: IBM-CPLEX/LPSOLVE, MINOS/NPSOL/SNOPT.

6.11 MINOTAUR

MINOTAUR (Mixed Integer Nonlinear Optimization Toolkit: Algorithms, Underestimators, Relaxations) is an open-source toolkit for solving Mixed Integer Nonlinear Optimiza- tion Problems. It provides different solvers that implement state-of-the-art algorithms for MINLP. The Minotaur library can also be used to customize algorithms to exploit on spe- cific problem structures. In summary: Ann Oper Res

Ð Implemented algorithms: NLP based Branch-and-Bound for convex MINLPs, Branch- and-Bound for global optimization of mixed integer quadratically constrained quadratic programming problems. Ð Type of algorithm: exact. Ð Available interface: stand-alone application, AMPL. Ð Open-source software: yes. Ð Programming language: C++. Ð Authors: A. Mahajan, S. Leyffer, J. Linderoth, J. Luedtke, and T. Munson. Ð Web site: http://wiki.mcs.anl.gov/minotaur/index.php/Main_Page. Ð Dependencies: Lapack (and optionally CLP, IBM-CPLEX, FILTERSQP, IPOPT with MUMPS/HSL).

6.12 SBB

SBB is a GAMS solver for MINLP models. It is based on a combination of the standard branch-and-bound method for MILP and some of the standard NLP solvers already sup- ported by GAMS. Currently, SBB can use CONOPT, MINOS,andSNOPT as solvers for sub- problems. SBB supports all types of discrete variables supported by GAMS, including binary, integer, semicontinuous, semiinteger, SOS1, SOS2. In summary: Ð Implemented algorithm: branch-and-bound. Ð Type of algorithm: exact for convex MINLPs. In the nonconvex case one can limit the fathoming of infeasible nodes. Ð Available interface: GAMS. Ð Open-source software: no. Ð Authors: ARKI Consulting and Development (Bussieck and Drud). Ð Web site: http://www.gams.com/solvers/solvers.htm#SBB. Ð Dependencies: GAMS, CONOPT/MINOS/SNOPT.

6.13 SCIP

SCIP (Solving Constraint Integer Programs) is a solver developed by the Optimization De- partment at the and its collaborators. For academic institutions, it is available in source code and as standalone binary and is distributed within GAMS. SCIP en- sures global optimal solutions for convex and nonconvex MINLPs. It implements a spatial branch-and-bound algorithm that utilizes LPs for bounding. Similar to BARON, the outer- approximation is generated from a reformulation of the MINLP. Note, that the support for non-quadratic nonlinear constraints is still in a BETA-stadium and not yet as robust as the rest of SCIP. In summary: Ð Implemented algorithms: spatial Branch-and-Bound, bound tightening. Ð Type of algorithm: exact. Ð Available interface: GAMS, invocation through C program. Ð Open-source software: yes. Ð Programming language: C. Ð Authors: S. Vigerske for the MINLP extension (Vigerske 2012; Achterberg 2007; Berthold et al. 2012). Ð Web site: http://scip.zib.de/. Ð Dependencies: SoPlex/IBM-CPLEX, CppAD (optionally IPOPT). Ann Oper Res

7 NEOS, server for optimization

Most of the solvers described in the previous sections can be freely accessed on the In- ternet through “NEOS (Network-Enabled Optimization System): Server for Optimization” (NEOS), maintained by the Optimization Technology Center of Northwestern University, the Argonne National Laboratory and the University of Wisconsin-Madison. A user can sub- mit an optimization problem to the server and obtain the solution and running time statistics using the preferred solver through different interfaces. In particular, three ways to interact with the server are available: 1. Web Interface: from the web site the user can submit a problem, typically written with the AMPL or GAMS modeling languages, and obtain results and statistics via email. 2. Client Interface: the user can communicate with the server using a client implementation with different programming languages such as Python, Perl, PHP, C, C++, Java, Ruby (see NEOS and XML-RPC for details). 3. Kestrel Interface: the user can submit a problem to the server and use the results within a modeling environment. In the client machine, an executable file is called in order to directly communicate with the server within the AMPL or GAMS environment (see http://www-neos.mcs.anl.gov/neos/kestrel.html for details). The chance to access different solvers is a great opportunity for the optimization community, both because a user can test the performance of different solvers and decide which one is most suitable for a specific model and because some of the available solvers are commercial solvers that cannot be used in other ways by users who do not own a license. The uniformity of the interfaces available for the different solvers makes switching from one solver to an- other easy, without any additional work. Moreover, it is intuitive to use, not requiring a deep expertise in optimization.

8 Conclusions

In this paper we have presented an overview of software for solving mixed integer nonlinear programming problems. We have introduced such problems and their theoretical and prac- tical difficulties. Because the solution of MINLPs often requires the solution of NLP and/or MILP subproblems, we have briefly discussed the relationship among these problems and mentioned their specific methods. We have introduced the most important algorithms and the most commonly used solvers for MINLPs, pointing out their crucial characteristics. For a reader interested in testing the developed algorithms on literature test instances, among the library of MINLP problems we cite the most widely used: CMU/IBM Library (http://egon.cheme.cmu.edu/ibm/page.htm), the new CMU-IBM Cyber-Infrastructure for MINLP collaborative site (http://www.minlp.org), the MacMINLP library maintained by Leyffer (http://www.mcs.anl.gov/~leyffer/macminlp/index.html), and the MINLPlib (http:// www.gamsworld.org/minlp/minlplib.htm). Finally, we point out that benchmark results on optimization solvers, which are beyond the scope of this paper, can be found at the web sites http://plato.asu.edu/bench.html and http://www.gamsworld.org.

Acknowledgements We are grateful to Silvano Martello for precious suggestions. Thanks are also due to Stefan Vigerske for useful comments and discussions. Ann Oper Res

References

Abhishek, K. (2008). Topics in mixed integer nonlinear programming. Ph.D. thesis, Lehigh University. Abhishek, K., Leyffer, S., & Linderoth, J. (2010). FilMINT: an outer-approximation-based solver for nonlin- ear mixed integer programs. INFORMS Journal on Computing, 22, 555Ð567. Achterberg, T. (2007). Constraint integer programming. Ph.D. thesis, Technische Universität Berlin. Adjiman, C., Androulakis, I., & Floudas, C. (1997). Global optimization of MINLP problems in process synthesis and design. Computers & Chemical Engineering, 21, 445Ð450. Adjiman, C., Androulakis, I., & Floudas, C. (2000). Global optimization of mixed-integer nonlinear prob- lems. AIChE Journal, 46, 1769Ð1797. Androulakis, I., Maranas, C., & Floudas, C. (1995). αBB: a global optimization method for general con- strained nonconvex problems. Journal of Global Optimization, 7, 337Ð363. Beale, E., & Tomlin, J. (1970). Special facilities in a general mathematical programming system for non- convex problems using ordered sets of variables. In J. Lawrence (Ed.), Proceedings of the Fifth Interna- tional Conference on Operational Research: OR 69 (pp. 447Ð454). London: Tavistock. Belotti, P., Lee, J., Liberti, L., Margot, F., & Wächter, A. (2009). Branching and bounds tightening techniques for non-convex MINLP. Optimization Methods & Software, 24, 597Ð634. Benders, J. (1962). Partitioning procedures for solving mixed-variables programming problems. Numerische Mathematik, 4, 267Ð299. Berthold, T., Heinz, S., & Vigerske, S. (2012). Extending a CIP framework to solve MIQCPs. In J. Lee & S. Leyffer (Eds.), IMA volumes in mathematics and its applications: Vol. 154. Mixed-integer nonlinear optimization: algorithmic advances and applications (pp. 427Ð444). Berlin: Springer. Bonami, P., & Gonçalves, J. (2008). Primal heuristics for mixed integer nonlinear programs (Tech. Rep.). IBM Research Report RC24639. Bonami, P., Forrest, J., Lee, J., & Wächter, A. (2007). Rapid development of an MINLP solver with COIN- OR. Optima, 75, 1Ð5. Bonami, P., Biegler, L., Conn, A., Cornuéjols, G., Grossmann, I., Laird, C., Lee, J., Lodi, A., Margot, F., Sawaya, N., & Wächter, A. (2008). An algorithmic framework for convex mixed integer nonlinear programs. Discrete Optimization, 5, 186Ð204. Bonami, P., Cornuéjols, G., Lodi, A., & Margot, F. (2009). A feasibility pump for mixed integer nonlinear programs. Mathematical Programming, 119, 331Ð352. Bongartz, I., Conn, A. R., Gould, N., & Toint, P. L. (1995). CUTE: constrained and unconstrained testing environment. ACM Transactions on Mathematical Software, 21, 123Ð160. doi:10.1145/200979.201043. Brooke, A., Kendrick, D., & Meeraus, A. (1992). GAMS: a user’s guide.URLciteseer.ist.psu.edu/ brooke92gams.html. Bussieck, M., & Drud, A. SSB: a new solver for mixed integer nonlinear programming. In Recent advances in nonlinear mixed integer optimization, INFORMS Fall, Invited talk. Bussieck, M., & Vigerske, S. (2011). MINLP solver software. In J. Cochran (Ed.), Wiley encyclopedia of operations research and management science. New York: Wiley. CBC. URL https://projects.coin-or.org/Cbc. Conn, A., Scheinberg, K., & Vicente, L. (2008). MPS/SIAM book series on optimization. Introduction to derivative free optimization. Philadelphia: SIAM. Dakin, R. (1965). A tree-search algorithm for mixed integer programming problems. Computer Jour- nal, 8(3), 250Ð255. doi:10.1093/comjnl/8.3.250.URLhttp://comjnl.oxfordjournals.org/content/8/3/250. abstract. D’Ambrosio, C. (2010). Application-oriented mixed integer non-linear programming. 4OR, 8, 319Ð322. D’Ambrosio, C., Frangioni, A., Liberti, L., & Lodi, A. (2010). A storm of feasibility pumps for nonconvex MINLP (Tech. Rep. OR/10/13). Università di Bologna. To appear in Mathematical Programming. D’Ambrosio, C., & Lodi, A. (2011). Mixed integer non-linear programming tools: a practical overview. 4OR: A. 4OR, 9, 329Ð349. Duran, M., & Grossmann, I. (1986). An outer-approximation algorithm for a class of mixed-integer nonlinear programs. Mathematical Programming, 36, 307Ð339. Fourer, R., Gay, D., & Kernighan, B. (2003). AMPL: a modeling language for mathematical programming (2nd ed.). Monterey: Duxbury Press/Brooks/Cole Publishing Co. Geoffrion, A. (1972). Generalized Benders decomposition. Journal of Optimization Theory and Applications, 10, 237Ð260. Grossmann, I. (2002). Review of nonlinear mixed-integer and disjunctive programming techniques. Opti- mization and Engineering, 3, 227Ð252. Gupta, O., & Ravindran, V. (1985). Branch and bound experiments in convex nonlinear integer programming. Management Science, 31, 1533Ð1546. GUROBI. URL http://www.gurobi.com/. Ann Oper Res

IBM-CPLEX.URLhttp://www-01.ibm.com/software/integration/optimization/cplex/. (v. 12.0). Jeroslow, R. (1973). There cannot be any algorithm for integer programming with quadratic constraints. Operations Research, 21, 221Ð224. Kelley, J. E. Jr. (1960). The cutting-plane method for solving convex programs. Journal of the Society for Industrial and Applied Mathematics, 8, 703Ð712. Kesavan, P., & Barto, P. (2000). Generalized branch-and-cut framework for mixed-integer nonlinear opti- mization problems. Computers & Chemical Engineering, 24, 1361Ð1366. Kocis, G., & Grossmann, I. (1989). Computational experience with DICOPT solving MINLP problems in process systems engineering. Computers & Chemical Engineering, 13, 307Ð315. Land, A., & Doig, A. (1960). An automatic method of solving discrete programming problems. Economet- rica, 28(3), 497Ð520. URL http://www.jstor.org/stable/1910129. Lee, J., & Leyffer, S. (Eds.) (2012). IMA volumes in mathematics and its applications: Vol. 154. Mixed integer nonlinear programming. Berlin: Springer. Leyffer, S. (1999). User manual for MINLP_BB (Tech. Rep.). University of Dundee. Leyffer, S. (2001). Integrating SQP and branch-and-bound for mixed integer nonlinear programming. Com- putational Optimization and Applications, 18, 295Ð309. Leyffer, S., & Mahajan, A. (2011). Software for nonlinearly constrained optimization. New York: Wiley. Liberti, L. (2004a). Reformulation and convex relaxation techniques for global optimization. Ph.D. thesis, Imperial College, London, UK. Liberti, L. (2004b). Reformulation and convex relaxation techniques for global optimization. 4OR, 2, 255Ð 258. Liberti, L. (2006). Writing global optimization software. In L. Liberti & N. Maculan (Eds.), Global optimiza- tion: from theory to implementation (pp. 211Ð262). Berlin: Springer. Liberti, L., Cafieri, S., & Tarissan, F. (2009a). Reformulations in mathematical programming: a computa- tional approach. In A. Abraham, A. Hassanien, & P. Siarry (Eds.), Studies in computational intelligence: Vol. 203. Foundations on computational intelligence, vol. 3 (pp. 153Ð234). New York: Springer. Liberti, L., Nannicini, G., & Mladenovic, N. (2009b). A good recipe for solving MINLPs. In V. Maniezzo, T. Stützle, & S. Voss (Eds.), Annals of information systems: Vol. 10. MATHEURISTICS: hybridizing and mathematical programming (pp. 231Ð244). Berlin: Springer. Linderoth, J., & Lodi, A. (2011). MILP software. In J. Cochran (Ed.), Wiley encyclopedia of operations research and management science (Vol. 5, pp. 3239Ð3248). New York: Wiley. Lodi, A. (2009). Mixed integer programming computation. In M. Jünger, T. Liebling, D. Naddef, G. Nemhauser, W. Pulleyblank, G. Reinelt, G. Rinaldi, & L. Wolsey (Eds.), 50 Years of integer program- ming 1958–2008: from the early years to the state-of-the-art (pp. 619Ð645). Berlin: Springer. Mangasarian, O. (1965). Pseudo-convex functions. Journal of the Society for Industrial and Applied Mathe- matics, 3, 281Ð290. McCormick, G. (1976). Computability of global solutions to factorable nonconvex programs: Part I—convex underestimating problems. Mathematical Programming, 10, 147Ð175. Nannicini, G., & Belotti, P. (2011). Rounding based heuristics for nonconvex MINLPs (Tech. Rep.). Tepper, School of Business, Carnegie Mellon University. March. Nemhauser, G., Savelsbergh, M., & Sigismondi, G. (1994). MINTO, a mixed INTeger optimizer. Operations Research Letters, 15, 47Ð585. NEOS. URL www-neos.mcs.anl.gov/neos (v. 5.0). Nocedal, J., & Wright, S. (2006). Springer series in operations research. Numerical optimization. Nowak, I. (2005). International series of numerical mathematics. Relaxation and decomposition methods for mixed integer nonlinear programming. Berlin: Birkhäuser. Nowak, I., & Vigerske, S. (2008). LaGO—a (heuristic) branch and cut algorithm for nonconvex MINLPs. Central European Journal of Operations Research, 16, 127Ð138. Quesada, I., & Grossmann, I. (1992). An LP/NLP based branch and bound algorithm for convex MINLP optimization problems. Computers & Chemical Engineering, 16, 937Ð947. Ryoo, H., & Sahinidis, N. (1996). A branch-and-reduce approach to global optimization. Journal of Global Optimization, 8, 107Ð138. Sahinidis, N. (1996). BARON: a general purpose global optimization software package. Journal of Global Optimization, 8, 201Ð205. Schweiger, C., & Floudas, C. (1998a). MINOPT: a modeling language and algorithmic framework for lin- ear, mixed-integer, nonlinear, dynamic, and mixed-integer nonlinear optimization. Princeton: Princeton University Press. Schweiger, C., & Floudas, C. (1998b). MINOPT: a software package for mixed-integer nonlinear optimiza- tion(3rded.). SCIP. URL http://scip.zib.de/scip.shtml. Ann Oper Res

Smith, E., & Pantelides, C. (1999). A symbolic reformulation/spatial branch and bound algorithm for the global optimization of nonconvex MINLPs. Computers & Chemical Engineering, 23, 457Ð478. Tawarmalani, M., & Sahinidis, N. (2004). Global optimization of mixed-integer nonlinear programs: a theo- retical and computational study. Mathematical Programming, 99, 563Ð591. Vigerske, S. (2012). Decomposition in multistage stochastic programming and a constraint integer program- ming approach to mixed-integer nonlinear programming. PhD Thesis, Humboldt-Universität zu Berlin. Westerlund, T., & Pettersson, F. (1995). A cutting plane method for solving convex MINLP problems. Com- puters & Chemical Engineering, 19, S131ÐS136. Westerlund, T., & Pörn, R. (2002). Solving pseudo-convex mixed integer problems by cutting plane tech- niques. Optimization and Engineering, 3, 253Ð280. Westerlund, T., Skrifvars, H., Harjunkoski, I., & Pörn, R. (1998). An extended cutting plane method for solving a class of non-convex MINLP problems. Computers & Chemical Engineering, 22, 357Ð365. XML-RPC. URL http://www.xmlrpc.com. XPRESS. URL http://www.fico.com/en/Products/DMTools/Pages/FICO-Xpress-Optimization-Suite.aspx. 4OR-Q J Oper Res DOI 10.1007/s10288-010-0118-8

PHD THESIS

Application-oriented mixed integer non-linear programming

Claudia D’Ambrosio

Received: 9 January 2010 / Revised: 17 January 2010 © Springer-Verlag 2010

Abstract This is a summary of the author’s PhD thesis supervised by Andrea Lodi and defended on 16 April 2009 at the University of Bologna. The thesis is writ- ten in English and available for download at http://www.or.deis.unibo.it/staff_pages/ dambrosio/Phd_Th_DAmbrosio.tar.gz. The main topic of the thesis is Mixed Integer Non-Linear Programming, with focus on non-convex problems (i.e., problems for which the feasible region of the continuous relaxation is a non-convex set) and real- world applications. Different kinds of algorithms are presented: linearization methods, heuristic and global optimization algorithms. Also, different kinds of real-world appli- cations are solved, arising, for example, from Hydraulic and Electrical Engineering problems. The last part of the thesis is devoted to software and tools for mixed integer non-linear programming problems.

Keywords Mixed integer non-linear programming · Real-world application · Non-convex · Algorithms · MINLP tools

Mathematics Subject Classification (2000) 90C11 · 90C26 · 90C90 · 65K05

1 Introduction

In the most recent years there has been a renovated interest for mixed integer non-linear programming (MINLP) problems, the main motivations being: (i) the performance of solvers handling non-linear constraints, which was largely improved; (ii) the aware- ness that most of the applications from the real-world can be modeled as an MINLP problem; (iii) the challenging nature of this very general class of problems.

C. D’Ambrosio (B) DEIS, University of Bologna, Viale Risorgimento 2, 40136 Bologna, Italy e-mail: [email protected] 123 C. D’Ambrosio

Besides being hard from a theoretical viewpoint, MINLPs are, in general, also very difficult to solve in practice. We address non-convex MINLPs, i.e., problems having non-convex continuous relaxations: the presence of non-convexities within the model makes them usually even harder to solve. The aim of this PhD thesis is to examine different possible approaches that one can study to attack MINLP problems with non-convexities: linearizations and approxi- mations, heuristic methods and global optimization algorithms. A special attention is given to real-world problems. In the introductory part of the thesis we introduce the problem and present the three most important special cases of general MINLPs and the most effective methods used to solve them. These techniques play a fundamental role in the resolution of general MINLP problems. We then describe algorithms addressing general MINLPs. The main contributions of the thesis are described in the next two sections, where methods for MINLP and real-world application are presented. The last part of the thesis is devoted to software and tools useful for solving MINLP problems.

2 Modeling and solving non-convexities

Four different methods, aimed at solving different classes of MINLP problems, are presented.

A feasibility pump heuristic for non-convex MINLPs We present a new feasi- bility pump (FP) algorithm tailored for general non-convex MINLP problems. Dif- ferences with the previously proposed FP algorithms and difficulties arising from non-convexities in the models are extensively discussed. The basic idea of FP iter- ative algorithms is that, at each iteration, two simpler subproblems are solved and the distance of the solutions of the two subproblems decreases. The main meth- odological innovations of this variant are: (a) the first subproblem is a non-con- vex continuous Non-Linear Program, which is solved using heuristic techniques; (b) the solution method for the second subproblem, a Mixed Integer Linear Pro- gramming (MILP), is complemented by a tabu list. We exhibit computational results showing the good performance of the algorithm on instances taken from the MINLP- Lib. Morover, for the first time we point out the strong relationship between no-good and interval-gradient cuts, then formalized in D’Ambrosio et al. (2009).

A global optimization method for a class of MINLP problems We propose a global optimization algorithm for separable non-convex MINLPs, i.e., MINLPs in which the non-convex functions involved are sums of univariate functions. There are many problems that are already in such a form, or can be brought into such a form via some simple substitutions. We developed a simple algorithm, implemented at the level of a modeling language, to attack such separable problems. First, we identify subintervals of convexity and concavity for the univariate functions using external tools for symbolic analysis. With such an identification at hand, we develop a convex MINLP relaxation of the problem. We work on each subinterval of convexity and con- cavity separately, using linear relaxation on only the “concave side” of each function on the subintervals. The subintervals are glued together using binary variables. Next, 123 Application-oriented mixed integer non-linear programming we repeatedly refine our convex MINLP relaxation by modifying it at the model- ing level. Finally, by fixing the integer variables in the original non-convex MINLP, and then locally solving the associated non-convex Non-Linear Programming (NLP) relaxation, we get an upper bound on the global minimum. Preliminary computational experiments on different instances are presented in D’Ambrosio et al. (2009a,b).

Approximating non-linear functions of two variables We consider three meth- ods for the piecewise linear approximation of functions of two variables for inclusion within MILP models. The simplest one applies the classical one-variable technique over a discretized set of values of the second independent variable. A more complex approach is based on the definition of triangles in the three-dimensional space. The third method we describe can be seen as an intermediate approach, recently used within an applied context, which appears particularly suitable for MILP modeling. We show that the three approaches do not dominate each other, and give a detailed description of how they can be embedded in a MILP model. Advantages and drawbacks of the three methods are discussed on the basis of some numerical examples. These results are presented in D’Ambrosio et al. (2010).

NLP-based heuristics for MILP problems We present preliminary computational results on heuristics for Mixed Integer Linear Programming. A heuristic for hard MILP problems based on NLP techniques is presented: the peculiarity of our approach to MILP problems is that we reformulate integrality requirements treating them in the non-convex objective function, ending up with a mapping from the MILP feasibility problem to NLP problem(s).

3 Applications

A relevant part of the thesis is devoted to real-world applications: two different prob- lems and approaches to MINLPs are presented, namely scheduling and unit commit- ment for hydro-plants, and water network design problems. The results show that each of these different methods has advantages and disadvantages.

Hydro scheduling and unit commitment We study a unit commitment problem of a generation company whose aim is to find the optimal scheduling of a multi- unit pump-storage hydro power station, for a short term period in which the electricity prices are forecasted. The problem has a mixed-integer non-linear structure, that makes very hard to handle the corresponding mathematical models. However, modern MILP software tools have reached a high efficiency, both in terms of solution accuracy and computing time. Hence we introduce MILP models of increasing complexity, that allow to accurately represent most of the hydro-electric system characteristics, and turn out to be computationally solvable. In particular, we present a model that takes into account the head effects on power production through an enhanced linearization technique, and turns out to be more general and efficient than those available in the literature. The practical behavior of the models is analyzed through computational experiments on real-world data which are reported in Borghetti et al. (2008). 123 C. D’Ambrosio

Water network design problem We present a solution method for a water-network optimization problem using a non-convex continuous NLP relaxation and a MINLP search. Our approach employs a relatively simple and accurate model that pays some attention to the requirements of the solvers that we employ. In this way, we exploit the complicated and effective algorithms implemented in the MINLP solver for calcu- lating good feasible solutions. We report successful computational experience using available open-source MINLP software on problems from the literature and on diffi- cult real-world instances. The method is extensively described in Bragalli et al. (2006, 2008).

4 Tools for MINLP

The last part of the thesis contains a review of tools commonly used for general MINLP problems. We start by describing the main characteristics of solvers for the most commonly used special cases of MINLP. Then we examine solvers for general MINLPs: for each solver a brief description, taken from the manuals, is given together with a schematic table containing the most importart information, such as, e.g., the class of problems they address, the implemented algorithms, the dependencies on external software. Tools for MINLP, especially open-source software, constituted an important part of the development of this PhD thesis. The great majority of the methods presented in Sects. 2 and 3 were completely developed using such solvers. A notable example of the importance of open-source solvers is represented by the resolution method of the Water Network Design problem described in Sect. 3: we present an algorithm for solving this non-convex MINLP problem, using the open-source software BONMIN. The solver was originally tailored for convex MINLPs. However, some accommoda- tions were made to handle non-convex problems and they were developed and tested in the context of the thesis, in which additional details on these features can be found.

References

Borghetti A, D’Ambrosio C, Lodi A, Martello S (2008) An MILP approach for short-term hydro scheduling and unit commitment with head-dependent reservoir. IEEE Trans Power Syst 23(3):1115–1124 Bragalli C, D’Ambrosio C, Lee J, Lodi A, Toth P (2006) An MINLP solution method for a water network problem. In: Azar Y, Erlebach T (eds) Algorithms—ESA 2006 (14th Annual European symposium. Zurich, Switzerland, September 2006), Lecture Notes in Computer Science, Springer, pp 696–707 Bragalli C, D’Ambrosio C, Lee J, Lodi A, Toth P (2008) Water network design by MINLP. IBM Research Report RC24495 D’Ambrosio C, Frangioni A, Liberti L, Lodi A (2009) On interval-subgradient and no-good cuts. Technical Report OR-09-09, University of Bologna D’Ambrosio C, Lee J, Wächter A (2009a) An algorithmic framework for MINLP with separable non-con- vexity. IBM Research Report RC24810 D’Ambrosio C, Lee J, Wächter A (2009b) A global-optimization algorithm for mixed-integer nonlinear programs having separable non-convexity. In: Fiat A, Sanders P (eds) ESA 2009 (17th Annual Euro- pean symposium. Copenhagen, Denmark, September 2009), Lecture Notes in Computer Science 5757. Springer, Berlin, pp 107–118 D’Ambrosio C, Lodi A, Martello S (2010) Piecewise linear approximation of functions of two variables in MILP models. Oper Res Lett 38(1):39–46

123 On a nonconvex MINLP formulation of the Euclidean Steiner tree problems in n-space

Claudia D’Ambrosio1, Marcia Fampa2, Jon Lee3, and Stefan Vigerske4

1 LIX CNRS (UMR7161), Ecole´ Polytechnique 91128 Palaiseau Cedex, France. [email protected] 2 Instituto de Matem´aticaand COPPE, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, Brazil. [email protected] 3 IOE Department, University of Michigan. Ann Arbor, Michigan, USA. [email protected] 4 ZIB (Konrad-Zuse-Zentrum f¨ur Informationstechnik Berlin), Berlin, Germany. [email protected]

Abstract. The Euclidean Steiner Tree Problem in dimension greater than 2 is notoriously difficult. Successful methods for exact solution are not based on mathematical-optimization — rather, they involve very so- phisticated enumeration. There are two types of mathematical-optimiza- tion formulations in the literature, and it is an understatement to say that neither scales well enough to be useful. We focus on a known nonconvex MINLP formulation. Our goal is to make some first steps in improving the formulation so that large instances may eventually be amenable to solution by a spatial branch-and-bound algorithm. Along the way, we de- veloped a new feature which we incorporated into the global-optimization solver SCIP and made accessible via the modeling language AMPL, for han- dling piecewise-smooth univariate functions that are globally concave.

1 Introduction

The Euclidean Steiner tree problem (ESTP) in IRn is: Given a set of finite points in IRn, find a tree of minimal Euclidean length spanning these points, using or not additional points. Original points are terminals and additional nodes in the tree are Steiner points. The ESTP is NP-Hard [9], and interest in the problem stems from both the mathematical challenge and its potential applications (e.g., communications, infrastructure networks). In biology, [3] gives an application of the ESTP to phylogenetic analysis (i.e., the construction of evolutionary trees). Basic properties of an optimal solution, called a Steiner minimal tree (SMT), are: (i) A Steiner point in an SMT has degree 3; a Steiner point and its adjacent nodes lie in a plane, and the angles between the edges connecting the point to its adjacent nodes are 120 degrees. (ii) A terminal in an SMT has degree between 1 and 3. (iii) An SMT on p terminals has at most p 2 Steiner points (see [12,5])). The topology of a Steiner tree is the tree for which− we have fixed the number of Steiner points and the edges between all points, but not the position of the Steiner points. A topology is a Steiner topology if each Steiner point has degree 3 and each terminal has degree 3 or less. A Steiner topology with p terminals is a full Steiner topology if there are p 2 Steiner points and each terminal has degree 1. A full Steiner tree is a Steiner− tree corresponding to a full Steiner topology. A Steiner tree corresponding to some topology, but with certain edges shrunk to zero length, is degenerate. Any SMT with a non-full Steiner topology can be associated with a full Steiner topology for which the tree is degenerate. Many papers have addressed exact solution in IR2, and impressive results were obtained with the GeoSteiner algorithm [18]. But these algorithms cannot be applied when n 3, and only a few papers have considered exact solution in this case. [11] proposed≥ solving the problem in IRn by enumerating all Steiner topologies and computing a min-length tree associated with each topology, which in practice, can only solve very small instances because of the fast growth of the number of topologies as p increases. A branch-and-bound (b&b) algorithm for finding SMTs in IRn was proposed by Smith [14]. He presented a scheme for im- plicitly enumerating all full Steiner topologies on a given set of terminals, and he gave computational results sufficient to disprove for all 3 n 9, an important conjecture of Gilbert and Pollak on the “Steiner ratio”. Fampa≤ ≤ and Anstreicher [6] used Smiths’s enumeration scheme and proposed a conic formulation for the problem of locating the Steiner points for a given topology, to obtain a lower bound on the min tree length and to implement a “strong branching” technique. [15] presented geometric conditions that are satisfied by Steiner trees with a full topology and applied those conditions to eliminate candidate topologies in Smith’s scheme. The best computational results for the ESTP for n 3 are presented in these two last papers. ≥ None of the works mentioned above have considered a math-programming formulation for the ESTP, which was presented only in [13] and [7]. [13] formu- lated the ESTP as a non-convex mixed-integer nonlinear programming (MINLP) problem and proposed a b&b algorithm using Lagrangian dual bounds. [7] pre- sented a convex MINLP formulation that could be implemented in a b&b al- gorithm using bounds computable from conic problems. Both formulations use 0/1 variables to indicate whether the edge connecting two nodes is present in a Steiner topology. The presence of these 0/1 variables leads to a natural branching scheme, however neither [13] nor [7] present computational results. We did some preliminary experiments with the nonconvex model. We solved some randomly generated instances using SCIP [1,16], and the results are dismal. Two difficulties observed have motivated this research: the weakness of the lower bounds given by the relaxations, and non-differentiability at points where the solution degenerates. In what follows, we investigate strategies to deal with these difficulties: (i) the use of approximate differentiable functions for the Euclidean norm, and (ii) nonconvex cuts based on geometric considerations.

2 A Nonconvex MINLP formulation

[13] formulates the ESTP as a nonconvex MINLP problem, first defining a special graph G = (V,E). Let P := 1, 2, ..., p be the indices associated with the given { } terminals a1, a2, ..., ap and S := p + 1, p + 2, ..., 2p 2 be the indices associated p+1 p{+2 2p 2 − } with the Steiner points x , x , ..., x − . Let V = P S. Denote by [i, j] ∪ an edge of G, with i, j V such that i < j. Define E := E1 E2, where E := [i, j]: i P, j ∈S and E := [i, j]: i S, j S . Define∪ a 0/1 y 1 { ∈ ∈ } 2 { ∈ ∈ } ij for each edge [i, j] E, where yij = 1 if the edge [i, j] is present in the SMT and 0 otherwise. The ESTP∈ is then formulated as

(MMX) min ai xj y + xi xj y , (1) k − k ij k − k ij [i,j] E [i,j] E X∈ 1 X∈ 2 y = 1, for i P, (2) ij ∈ j S X∈ y + y + y = 3, for j S, (3) ij kj jk ∈ i P kj,k S X∈ X∈ X∈ y = 1, for j S p + 1 , (4) kj ∈ − { } k

3 Dealing with the non-differentiability

The continuous relaxation of MMX is a nonconvex NLP problem. Convergence of most NLP solvers (e.g. Ipopt [17]) requires that functions be twice continuously differentiable. This is not the case for MMX due to the non-differentiability of the Euclidean norm when the solution degenerates (i.e., when the norm is zero); and it is easy to see examples where the optimal solution does degenerate. In water- network optimization, [2] smooths away non-differentiability at zero of another function (modeling the pressure drop due to friction of the turbulent flow of water in a pipe). There are different ways that we can deal with the non-differentiability that we face. Let w(xi xj) := xi xj 2, so that xi xj = w(xi xj). In this way, we can focus− on √ , thek source− k of the non-differentiability.k − k − · p

3.1 Implicit square roots One possibility is to introduce an auxiliary variable z, and use the additional inequality z2 + w 0 and the nonnegativity constraint z 0. In this way, − ≤ ≥ any optimal solution will have z = √w. This looks possibly attractive, but the overhead of so many additional nonnegatively-constrained variables and the difficulty of many additional nonconvex constraints is prohibitive. On the other hand, while nonconvex, these functions z2 +w manifest them- selves as z2 + n v2, with v = xi xj. The nonconvexity− in z2 + n v2 is − ij l=1 l − − ij l=1 l isolated to z2 , so it may be possible to adapt the techniques of [4] (exploiting ijP P concave separability),− though we have to deal with the multiplication of distance variables zij by 0/1 variables yij in the objective function of MMX. Another view is that z2 + w 0 is equivalent to √w z, which is mani- − ≤ i j 2 ≤ fested as the second-order cone constraint x x zij. We can try to exploit methods for handling such constraints, thoughk − wek have≤ to deal with the multi- plication of distance variables zij by 0/1 variables yij in the objective function.

3.2 Shifting A simple fix is to approximate √w by h(w) := √w + δ √δ for some small δ > 0. Then we underestimate all positive distances (via the− triangle inequality). Because our objective function is increasing in each distance, we get a relaxation of MMX. But a strong downside is that we underestimate all positive distances, and the error in each distance calculation rapidly approaches √δ as w increases.

3.3 Linear extrapolation The following approximation, depending on the choice of a σ > 0, was proposed in [10] to avoid non-differentiability of the Euclidean-distance function for the “traveling-salesman problem with neighborhoods.” √w, if w σ2; l(w) := σ + 1 w, if w ≥ σ2.  2 2σ ≤ The function l is well-defined at w = σ2. It is analytic except when w = σ2, and in this case, it is still differentiable once. In fact, l simply uses the tangent at w = σ2 of the graph of the strictly concave function √w to overestimate √w on [0, σ2). We already see that because l is not twice continuously differentiable, we should not expect good behavior from most NLP solvers. A strong shortcoming for our context is that l miscalculates zero distances; that is, l(0) = σ/2, while obviously √0 = 0. Because l(w) is an upperbound on √w, and because our objective function is increasing in distances, using the approximation l, we do not get a relaxation of MMX. Moreover, for degenerate Steiner trees, we will systematically overestimate distances that should be zero.

3.4 Smooth under-estimation We propose another piecewise smoothing, using a particular homogeneous cubic depending on the choice of a λ > 0, that has very nice properties: √w, if w λ2; c(w) := 15 w 5 w2 + 3 w3, if w ≥ λ2.  8λ − 4λ3 8λ5 ≤ We have depicted all of these smoothings in Fig. 1.

Fig. 1: Behavior of all smoothings (λ2 = σ2 = 0.01):

– the solid curve (——–) is the true √ function. · – the “smooth underestimation” c, which we advocate, follows the dotted curve ( ) ·········· below w = 0.01. – the “linear extrapolation” l follows the dot-dashed line ( - - - ) below w = 0.01. · · · · – both piecewise-defined functions c and l follow the true √ function above w = 0.01. · – the “shift” h (with δ = (4λ/15)2 chosen so that it has the same derivative as c does at 0) follows the consistent underestimate given by the dashed curve (- - - -). The next result makes a very strong case for the smoothing c.

Thm. 1.

1. c(w) agrees with √w in value at w = 0; 2. c(w) agrees with √w in value, derivative and second derivative at w = λ2; hence c is twice continuously differentiable; 3. c is strictly concave on [0, + ] (just like √ ); ∞ · 4. c is strictly increasing on [0, + ] (just like √ ); consequently, c( xi xj 2) is quasiconvex (just like xi ∞xj ); · k − k k − k 5. √w c(w) 0 on [0, + ]; − ≥ ∞ 6. For all λ > 0, max (√w c(w))/λ : w [0, + ] is the real root γ of 168750+1050625x+996300{ − x2+236196x3, which∈ is∞ approximately} 0.141106. −

Because distances only appear in the objective, and in a very simple manner, the approximation with c is a relaxation (due to (5) of Thm. 1). Meaning that the objective value of a global optimum using distance approximation is a lower bound on the true global optimum. And plugging the obtained solution into the true objective function gives an upper bound on the value of a true global optimum. So, in the end we get a solution and a bound on how close to optimal we can certify it to be. To be precise, for any λ > 0, let MMX(λ) denote MMX with all square roots in the norms replaced by the function c.

Cor. 2. The optimal value of MMX is between the optimal value of MMX(λ) and the optimal value of MMX(λ) plus λγ(2p 3) (γ 0.141106). − ≈ We note that the upper bound of Cor. 2 is a very pessimistic worst case — only achievable when the optimal tree has a full Steiner topology and all edges are very short. If such were the case, certainly λ should be decreased. Furthermore, we emphasize that if the length of every edge in the SMT that solves MMX(λ) is either zero (degenerate case) or greater than or equal to λ, then the optimal values of MMX and MMX(λ) are the same. 2 For δ = (4λ/15) , we have c0(0) = h0(0). That is, at this value of δ we can expect that c and h will have the same numerical-stability properties near 0.

Prop. 3. For δ = (4λ/15)2, we have h(w) < c(w) on (0, + ). Moreover c(w) h(w) is strictly increasing on [0, + ). ∞ − ∞ Hence the relaxation provided using c is always at least as strong as the relaxation provided using h. In fact, strictly stronger in any realistic case (specifically, when there is more than 1 terminal). We make a few further observations about c:

15 Choosing λ. Note that c0(0) = 8λ , so we should not be too aggressive in picking λ extremely small. But a large derivative at 0 is the price that we pay for getting everything else; in particular, any concave function f that agrees with √w at 2 1 w = 0 and w = λ has f 0(0) λ . By Cor. 2, choosing λ to be around 2p 1 would seem to make sense, thus guaranteeing≥ that our optimal solution of MMX(− λ) is within a universal additive constant of the optimal value of MMX. If the points are quite close together, either they should be scaled up or λ can be decreased.

Secant lower bound. Owing to the strict concavity, in the context of an sbb algorithm, c is always best lower bounded by a secant on any subinterval. Of course, it can be an issue whether a sbb solver can recognize and exploit this. In general, such solvers (e.g., Baron and Couenne) do not yet support general piecewise-smooth concave functions, while conceptually they could and should. However, we implemented a feature in SCIP (version 3.2) that allowed us to solve the proposed relaxation. In particular, by using the modeling-language AMPL interface and its general suffix facility (see [8]), we are able to specify to SCIP to treat a piecewise-defined constraint as globally concave. With this new SCIP feature we were able to solve the secant relaxation — see Section 5.

4 Tightening relaxations

4.1 Integer extreme points

Here, we examine the continuous relaxation of the feasible region of MMX, just in the space of the y-variables. That is, the set of yij [0, 1] satisfying (2- 4). The coefficient matrix of the system (2-4) is not totally∈ unimodular (TU). However, for each j S p + 1 , we can subtract the equation of (4) from the ∈ − { } corresponding equation of (3) to arrive at the system:

y = 1, for i P, (6) ij ∈ j S X∈ 3, for j = p + 1, y + y = (7) ij jk 2, for j S p + 1 , i P k>j,k S  ∈ − { } X∈ X∈ y = 1, for j S p + 1 . (8) kj ∈ − { } k

Thm. 4. The set of y [0, 1] satisfying (2-4) has integer extreme points. ij ∈ Cor. 5. No valid linear inequality in the y-variables alone can improve the linear relaxation of the equations (2-4) describing full Steiner topologies. This is not to say that optimality-based (linear) inequalities in the y-variables alone cannot be derived (see 4.2.2). § Cor. 6. Given a globally-optimal solution of the continuous relaxation of MMX, in polynomial time we can calculate a globally-optimal solution of MMX. The relevant b-matching problem is depicted in Fig. 2. There is a node for every constraint of (6-8) and an edge for every variable. To the right of each node is its required degree. All possible edges exist between nodes at levels (6) and (7); i.e., those node sets induce a complete bipartite graph. All possible edges extending “down to the right” exist between nodes at levels (7) and (8); i.e., between nodes at level (7) and all nodes of greater number at level (8). A feasible choice of edges (selecting a full Steiner topology) is one that meets the degree requirements. E.g., we can see that we must choose the edge between node p + 1 of level (7) and node p + 2 of level (8).

4.2 Geometric cuts Based on various geometric considerations concerning optimal solutions, we can derive several families of valid inequalities seeking to improve relaxations of our formulation. Some of these are nonlinear, and it is an ongoing challenge to take advantage of them computationally. Until we address it in 4.2.3, assume that all norms are taken exactly — not smoothed. § i Let ηi be the distance from terminal a to the nearest other terminal; i.e.,

i j ηi := min a a , i P. (9) j P, j=i{k − k} ∀ ∈ ∈ 6 1 1 1

1 2 p ··· (6)

3 2 2 2

p+1 p+2 j 2p-2 ··· ··· (7)

1 1 1 1

p+2 p+3 j+1 2p-2 ··· ··· (8)

Fig. 2: The bipartite b-matching model for selecting a full Steiner topology

4.2.1 Non-combinatorial cuts. Thm. 7. For all n 2, we have ≥ y xk ai η , i P, k S. (10) ik k − k ≤ i ∀ ∈ ∈ Lem. 8. Among triangles with edge lengths a, b, c and corresponding angles x,y,z, with c and z fixed, the one maximizing a + b is isosceles (that is a = b, x = y). Thm. 9. For all n 2, we have ≥ y y xk ai + xk aj 2 ai aj , i, j P, i < j, k S. (11) ik jk k − k k − k ≤ √3 k − k ∀ ∈ ∈ In computations, we treat the bilinear term yikyjk by replacing it with a variable yijk and using the standard McCormick inequalities. Another way to try and use the same principle is as follows: Thm. 10. For all n 2, we have ≥ y y xk ai xk aj xk ai 1 ai aj = 0, i, j P, i < j, k S. ik jkk − kk − k k − k − √3 k − k ∀ ∈ ∈   (12) Thms. 9 and 10 easily extends to the case where the Steiner point xk is adjacent to only 1 terminal in the SMT and also to the case where xk is not adjacent to any terminal. In the such cases, we have y y xk ai + xk xl 2 ai xl , i P, k, l S, k < l; (13) ik kl k − k k − k ≤ √3 k − k ∀ ∈ ∈ y y xk xl + xk xm  2 xl xm , k, l, m S, k < l < m. (14) kl km k − k k − k ≤ √3 k − k ∀ ∈ Via another geometric principle, we have the following result. Thm. 11. For n = 3 and i, j P, i < j, k, l S, k < l, we have ∈ ∈ ai aj xk xl y y y det = 0. (15) ik jk kl · 1 1 1 1   We note that the determinant of the 4 4 matrix in Thm. 11 is quadratic in the x-variables — actually bilinear between× xk and xl. In computations, we treat the trilinear term yikyjkykl by replacing it with a variable yijkl and using the standard inequalities

y y , y y , y y , ijkl ≤ ik ijkl ≤ jk ijkl ≤ kl y + y + y 2 + y . ik jk kl ≤ ijkl Thm. 11 easily extends to dimensions n > 3. Thm. 12. For n 3, i, j P, i < j, k, l S, k < l, and for every 3 4 submatrix B = [bi,≥ bj, ξk, ξl] ∈of the n 4 matrix∈ ai, aj, xk, xl , we have × × bi bj ξk ξl  y y y det = 0. (16) ik jk kl · 1 1 1 1  

4.2.2 Combinatorial cuts. In this section, we introduce some valid linear inequalities satisfied by optimal topologies of the ESTP. Thm. 13. For n 2 and i, j P, i < j, we have ≥ ∈ If ai aj > η + η , then y + y 1, k S. (17) k − k i j ik jk ≤ ∀ ∈ Considering the valid inequalities in (17), we note that the inequalities (11) can i j only be active for i, j P, i < j, k S, such that a a ηi +ηj. Therefore, only such valid inequalities∈ should be∈ included in MMX.k − Furthermore,k ≤ the result in Thm. 13 extends to the more general case addressed in Thm. 14.

Thm. 14. Let H be a graph with vertex set P , and such that i and j are adjacent i j if a a > ηi + ηj. Then for each k S, the set of i P such that yik = 1 in ank optimal− k solution is a stable set of H∈. Therefore, every∈ valid linear inequality

i P αiyi ρ for the stable set polytope of H yields a valid linear inequality ∈ ≤ i P αiyik ρ for each k S. P ∈ ≤ ∈ P Let T be a min-length spanning tree on terminals ai, i P . For i, j P , let ∈ ∈ i j βij := length of the longest edge on the path between a and a in T . (18)

The proof of the following well-known lemma can be found for example in [12]. We use it to prove the validity of cuts presented in Cor. 16.

Lem. 15. An SMT contains no edge of length greater than βij on the unique path between ai and aj, for all i, j P . ∈ Cor. 16. For n 2 and i, j P, i = j, we have ≥ ∈ 6 If ai aj > η + η + β , then y + y + y 2, k, l S, k < l. (19) k − k i j ij ik kl jl ≤ ∀ ∈

4.2.3 Smoothing in the context of cuts. In the definition and derivation of all cuts, we assumed that norms are taken exactly — not smoothed. Now, we confront the issue that we prefer to work with smoothed norms in the context of mathematical optimization. First of all, for any norm that just involves data and no variables, we do not smooth. This pertains to all occurrences of ai aj , and hence also all occur- rences of the parameters defined in (9) and (18).k − Anyk valid equation or inequality based only on such use of norms (or not based on norms at all) is valid for the original problem. So, these equations and inequalities do not exclude optimal solutions of the original problem, and so these solutions are candidate solutions to the problem where distances are smoothly underestimated (employing h or c), possibly with lower objective value. Therefore, any such equation or inequality is valid for the problem with distances smoothly underestimated. This applies to (15), (16), (17), inequalities based on Thm. 14, and (19). For (10) and (11), norms involving variables occur only on the low side of the inequalities, so smooth underestimation of these norms, employing h or c, keeps them valid. Inequalities (13,14) also contain norms involving variables on the high side of those inequalities. For those norms, we should replace them with smooth overestimates. For example, we could use an “overestimating shift” hˆ(w) := √w + δ, or the linear extrapolation l (possibly with a different breakpoint). In fact, by choosing the breakpoint for l at σ2 := (4λ/15)2, where λ2 is the breakpoint for c, we get c0(0) = l0(0) (= 15/8λ). That is, if we can numerically tolerate a breakpoint for the underestimate c at w = λ2, then we can equally tolerate a breakpoint for the overestimate l at w = σ2 = (4λ/15)2. While global sbb solvers do not yet support piecewise functions, at the modeling level we could utilize tangents of the concave √ at a few values of w greater than (4λ/15)2. I.e., 2 · 2 2 2 σi 1 choose some values σ > > σ > σ := (4λ/15) , let τσ (w) := + w, r ··· 1 0 i 2 2σi and we can instantiate (13) and (14) r + 1 times each, replacing the √ implicit · on the high side of these inequalities with τσi , for i = 0, 1, . . . , r. Finally, we have the equation (12). The norms xk ai and xk aj can be smoothed in any way that correctly evaluatesk the− normk at 0,k and− thek equation remains valid. So employing h or c leaves the equation valid. The norm involving xk in the multiplicand xk ai ai aj /√3 is thornier. One k − k − k − k way to address it is to simply replace it with xk ai 2 ai aj 2/3 , with  these (smooth) squared norms calculated exactly.k − k − k − k 

5 Experiments

In this section, we demonstrate the impact of our cuts on the solution of the ESTP, with results on 25 instances in IR3, where the terminals are randomly distributed in the cube [0, 10]3. For each value of p from 4 to 8, we created five instances. We show the effect of the cuts by solving the instances a few times with the sbb code SCIP, adding different cuts each time to MMX. In our experiments, we considered cuts (10,11, 13–15, 17). We compare the performance of SCIP on eight models, running with a time limit of 2 hours. The first model is MMX, with no cuts, the following six are MMX with the independent addition of cuts, and the last model is MMX with the addition of the three most effective cuts on the previous experiments. Summarizing our results, we have that adding the cuts have significant ef- fect on improving the lower bounds and decreasing the running time. The three classes of cuts, (10), (14) and (17) together improve the lower bound computed in the time limit, in the only two instances still not solved to optimality in 51% and 52%, and decreased the overall running time in 61% on average. Three extra instances, all with p = 8, were solved to optimality in the time limit, after the addition of the cuts. In Table 1, we show for each value of p, the average percent- age improvements on the running time of each model tested, when compared to MMX (100(Time(MMX) Time(Model))/Time(MMX)). (Negative values indi- cate worse times with the− addition of the cuts)

p MMX+ (10) (11) (13) (14) (15) (17) (10,14,17) 4 51 2 20 0 -21 24 41 5 44 -10 21 0 -82 4 56 6 92 72 91 88 -462 93 93 7 81 -274 48 56 -654 76 84 8 29 0 0 6 0 15 32 Table 1: Average % improvements on running time compared to MMX

Although these are still preliminary results, we see that our cuts can poten- tially improve the quality of the lower bounds computed by SCIP. The two other cuts proposed (16,19) have a positive effect in instances with more terminals or in higher dimensions. Finally, we see that using several families of cuts together can bring further improvements.

6 Conclusions

The ESTP is very difficult to solve for n > 2, and the application of sbb solvers to test instances using the MMX formulation points out considerable drawbacks, concerned with the non-differentiability of the Euclidean norm, and also to the extreme weakness of the lower bounds given by relaxations. MMX in its original form, with today’s sbb solvers, leads to dismal results. We presented different approximations for the square-roots — the source of our non-differentiability, which can be judiciously applied in the context of valid inequalities. In particular, we introduced a smooth underestimation func- tion, and we established several very appealing properties. We implemented this smoothing with a new feature of SCIP that we developed. This feature could be specialized to automatically smooth roots and other power functions. To improve the quality of lower bounds computed in an sbb algorithm, we presented a variety of valid inequalities, based on known geometric properties of a SMT. Preliminary numerical experiments demonstrates the potential of these cuts in improving lower bounds. Many of these cuts are nonconvex, and it is an interesting research direction to apply similar ideas to other nonconvex gloabl- optimization models. We demonstrated that the performance of the MMX model can be significantly improved, though it is still not the best method for ESTP with n > 2. So more work is needed to make the MMX approach competitive.

Acknowledgments

J. Lee was partially supported by NSF grant CMMI–1160915 and ONR grant N00014-14-1-0315, and Laboratoire d’Informatique de l’Ecole´ Polytechnique. M. Fampa was partially supported by CNPq and FAPERJ.

References

1. T. Achterberg. SCIP: Solving constraint integer programs. Mathematical Program- ming Computation, 1(1):1–41, 2009. 2. C. Bragalli, C. D’Ambrosio, J. Lee, A. Lodi, and P. Toth. On the optimal design of water distribution networks. Optim. Eng., 13(2):219–246, 2012. 3. L. Cavalli-Sforza and A. Edwards. Phylogenetic analysis: Models and estimation procedures. Evolution, 21:550–570, 1967. 4. C. D’Ambrosio, J. Lee, and A. W¨achter. A global-optimization algorithm for mixed-integer nonlinear programs having separable non-convexity. In Algorithms— ESA 2009, volume 5757 of LNCS, pages 107–118. Springer, Berlin, 2009. 5. D. Du and X. Hu. Steiner tree problems in computer communication networks. World Scientific Publishing Co. Pte. Ltd., Hackensack, NJ, 2008. 6. M. Fampa and K. M. Anstreicher. An improved algorithm for computing Steiner minimal trees in Euclidean d-space. Discrete Optimization, 5(2):530–540, 2008. 7. M. Fampa and N. Maculan. Using a conic formulation for finding Steiner minimal trees. Numerical Algorithms, 35(2-4):315–330, 2004. 8. R. Fourer, D. M. Gay, and B. W. Kernighan. AMPL: A Modeling Language for Mathematical Programming. Duxbury Press, Nov. 2002. 9. M. Garey, R. Graham, and D. Johnson. The complexity of computing Steiner minimal trees. SIAM J. Applied Mathematics, 1977. 10. I. Gentilini, F. Margot, and K. Shimada. The travelling salesman problem with neighbourhoods: MINLP solution. Opt. Meth. and Soft., 28(2):364–378, 2013. 11. E. Gilbert and H. Pollack. Steiner minimal trees. SIAM J. Applied Math., 1968. 12. F. Hwang, D. Richards, and W. Winter. The Steiner tree problem, volume Ann. of Disc. Math. 53. Elsevier, Amsterdam, 1992. 13. N. Maculan, P. Michelon, and A. Xavier. The Euclidean Steiner tree problem in Rn: A mathematical programming formulation. Ann. OR, 96(1-4):209–220, 2000. 14. W. D. Smith. How to find Steiner minimal trees in Euclidean d-space. Algorithmica, 7(1-6):137–177, 1992. 15. J. W. Van Laarhoven and K. M. Anstreicher. Geometric conditions for Euclidean Steiner trees in Rd. Comput. Geom., 46(5):520–531, 2013. 16. S. Vigerske. Decomposition of multistage stochastic programs and a constraint integer programming approach to MINLP. PhD thesis, Humboldt-U. Berlin, 2013. 17. A. W¨achter and L. Biegler. On the implementation of an interior-point filter line- search algorithm for large-scale NLP. Math. Prog., 106:25–57, 2006. 18. D. Warme, P. Winter, and M. Zachariasen. Exact algorithms for plane Steiner tree problems: a computational study. Advances in Steiner Trees, pages 81–116, 1998.

Computer Sciences Department

Valid Inequalities for the Pooling Problem with Binary Variables

Claudia D’Ambrosio Jeff Linderoth James Luedtke

Technical Report #1682

November 2010

Valid Inequalities for the Pooling Problem with Binary Variables

Claudia D’Ambrosio, Jeff Linderoth, James Luedtke∗

November 15, 2010

Abstract The pooling problem consists of finding the optimal quantity of final products to obtain by blending different compositions of raw materials in pools. Bilinear terms are required to model the quality of products in the pools, making the pooling problem a non-convex continuous optimization problem. In this paper we study a generalization of the standard pooling problem where binary variables are used to model fixed costs associated with using a raw material in a pool. We derive four classes of strong valid inequalities for the problem and demonstrate that the inequalities dominate classic flow cover inequalities. The inequalities can be separated in polynomial time. Computational results are reported that demonstrate the utility of the inequalities when used in a global optimization solver.

∗Department of Industrial and Systems Engineering, University of Wisconsin-Madison, Mechanical Engineering Building, 1513 University Ave., Madison, WI 53706, USA, cdambrosio;linderoth;jrluedt1 @wisc.edu { } 1 Introduction

The pooling problem is to optimize the net cost of products whose composition is determined by blending different raw materials. The blending network consists of three types of nodes: the input streams, represent- ing the raw materials; the pools, where the materials are blended; and the output streams, representing the final products. By deciding the flow quantity passing through the arcs of the this network, the composition of the final products is determined. There are many applications areas for the pooling problem, including petroleum refining, wastewater treatment, and general chemical engineering process design [2, 12, 1, 4]. Different variants of the pooling problem have been introduced in the literature. In the standard pooling problem, the topology of the net- work is fixed—the pools are fed only by the input streams and connected only to the output streams. The standard pooling problem may be modeled as a non-convex nonlinear program (NLP), where the nonlinear- ities are bilinear terms that are present to model the (linear) blending process that occurs. The generalized pooling problem involves discrete decisions, where the activation of arcs connecting different nodes of the network is to be decided. This problem can be modeled as a non-convex Mixed Integer Nonlinear Program (MINLP). In the extended pooling problem, the Environmental Protection Agency limits on complex emis- sions are considered and modeled. In this case, extra variables are introduced and the additional constraints are non-smooth. In many applications of the pooling problem finding the (certified) global optimum can result in significant monetary savings, so a significant research effort has been undertaken on global opti- mization approaches for the problem. For an overview of the pooling problem variants and the optimization techniques that have been applied successfully for solving such problems, the reader is referred to the recent and exhaustive survey [9]. The focus of this paper is on a special case of the generalized pooling problem, where the topology of a portion of the network is also to be decided. Specifically, there are yes-no decisions associated with connecting the input streams and the pools. For example, this variant of the pooling problem is relevant for crude oil scheduling operations [6, 14], where the input streams represent supply ships feeding the storage tanks of a refinery. Starting from a mathematical formulation of the pooling problem called the PQ-formulation, we detect an interesting set X for which we derive valid inequalities. The set X is a generalization of the well- known single-node fixed-charge flow set [10]. We introduce three classes of inequalities for this set that we call quality inequalities, because they are based on the quality target for a final product. A final class of valid inequalities introduced are called pooling flow cover inequalities, as they are related to, but dominate, standard flow cover inequalities for the single-node fixed-charge flow set. There are exponentially many pooling flow cover inequalities, and we demonstrate that they can be separated in polynomial time. The present work is one of the first attempts to generate valid inequalities for a variant of the stan- dard pooling problem that uses specific structures present in the problem. Other approaches have applied general-purpose convexification techniques, such as the reformulation-linearization technique [8] or dis- junctive programming [5]. Simultaneously (and independently), Papageorgiou et al. [11] take an approach related to ours. Specifically, they study a polyhedral set that serves as a relaxation to a blending problem with fixed-charge structure. The focus of [11] is on developing inequalities for the single product, uncapacitated case, and they are able to find facet-defining inequalities that are useful in computation. Our new valid inequalities were validated to be quite useful computationally both on instances available in the open literature and on randomly generated instances. Inequalities generated at the root node of a

1 branch-and-bound tree were added to a standard model for the problem and given to the state-of-the-art global optimization solver BARON. With the strengthened model on a test suite of 76 instances solvable by BARON in 2 hours, adding the inequalities resulted reducing BARON’s CPU time by a factor 2. An interesting observation from this study is that these useful valid inequalities for this mixed-integer nonconvex set were derived by studying a mixed-integer linear relaxation of the set. This suggests that it may be a useful approach in general to study mixed-integer linear relaxations of global optimization problems. The remainder of the paper is divided into three sections. Section 2 gives a mathematical description of the problem we study. Section 3 describes the classes of inequalities we have derived, and Section 4 gives the computational results.

2 The Pooling Problem

In this section, we introduce our variant of the standard pooling problem, in which a portion of the topology of the network must be decided. In this variant, a fixed cost is paid if an arc connecting an input stream to a pool is utilized.

2.1 Mathematical formulation Sahinidis and Tawarmalani [13] introduced the PQ-formulation for the standard pooling problem and demon- strated that it provided a tighter relaxation when the standard McCormick [7] approximation is used to relax the bilinear terms and obtain a Mixed Integer Linear Programming (MILP) problem. The PQ-formulation is used as the starting point for our model.

2.1.1 Notation

Input to the pooling problem consists of a network G = (N,A), where the nodes are partitioned into three sets N = I L J. The set I is the set of the input streams, the nodes representing the raw ∪ ∪ materials; the set L is the pool set, where the raw materials are blended; and J is the set of the output streams, the nodes representing the final products. For ease of notation, we will assume that there is a complete interconnection network between the nodes of I and L and between the nodes of L and J. That is, A = (i, l) i I, l L (l, j) l L, j J . The inequalities we derive later can easily be { | ∈ ∈ } ∪ { | ∈ ∈ } generalized to sparse networks. K is a set of input and output attributes. Each input stream i I has an associated unit cost c and availability A . Each output stream j J ∈ i i ∈ has an associated unit revenue d and demand bound D . Each pool l L has a size capacity S . The j j ∈ l parameters C denote the level of attribute k K found in input stream i I. For each output stream ik ∈ ∈ j J, P U is the upper bound on the composition range of attribute k K in the final product j. Finally, ∈ jk ∈ there are the parameters fil, which represent the fixed cost that has to be paid if the arc from input stream i I to pool l L is used. ∈ ∈ Decision variables in the formulation are the following: q : proportion of flow from input i I to pool l L. • il ∈ ∈ y : flow from intermediate pool node l L to output j J. • lj ∈ ∈ v : binary variable with value 1 if the arc from input i I to pool l L is used, 0 otherwise. • il ∈ ∈ w : flow from input i I to output j J through pool l L. • ilj ∈ ∈ ∈

2 2.1.2 The PQ-formulation

The objective is to maximize net profit:

min c w d y + f v . i ilj − j lj il il j J l L i I l L ! i I l L X∈ X∈ X∈ X∈ X∈ X∈ There are simple constraints that bound raw material availability, pool capacity, and final product demand:

w A i I; y S l L; y D j J. ilj ≤ i ∀ ∈ lj ≤ l ∀ ∈ lj ≤ j ∀ ∈ l L j J j J l L X∈ X∈ X∈ X∈ A distinguishing feature of the pooling problem is that there is an upper bound on the target value for each attribute of each product:

C w P U y j J, k K. (1) ik ilj ≤ jk lj ∀ ∈ ∀ ∈ l L i I l L X∈ X∈ X∈ The q variables must satisfy properties of proportion and only be positive if the associated arc was opened:

q 1 l L; 0 q v i I, l L. il ≤ ∀ ∈ ≤ il ≤ il ∀ ∈ ∀ ∈ i I X∈ The w variables are related to the other decision variables as

w = q y i I, l L, j J. (2) ilj il lj ∀ ∈ ∀ ∈ ∀ ∈ Finally, in the PQ-formulation, two redundant sets of constraints are added to the formulation to make subsequent relaxations stronger:

w = y l L, j J; w q S i I, l L. ilj lj ∀ ∈ ∀ ∈ ilj ≤ il l ∀ ∈ ∀ ∈ i I j J X∈ X∈ This formulation, augmented with appropriate bounds on the variables is given to the global optimization def solver BARON in our subsequent computational experiments. Let Ylj = min Sl,Dj, i I Ai be an { ∈ } upper bound on the flow on from l L to j J. By replacing the nonlinear equations (2) with the ∈ ∈ P well-known McCormick inequalities [7], a linear relaxation of the problem is formed. The McCormick inequalities in this context reduce to the following:

0 w q Y ; w y ; w Y q + y Y i I, l L, j J. ≤ ilj ≤ il lj ilj ≤ lj ilj ≥ lj il lj − lj ∀ ∈ ∀ ∈ ∀ ∈ Branching on the continuous variables q and y, as well as the binary variables v is required in order to converge to a globally optimal solution.

2.2 Example

3 Figure 1: Pooling problem example.

To demonstrate the valid inequalities we derive, we w 1 , C will use the following simple example shown in 1 = 3 Figure 1. There is a single pool, a single product, w2,C2 = 1 100 and a single attribute ( L = J = K = 1). ≤ | | | | | | = 2 3 The three input streams have input quality C1 = 3, ,C w3 C2 = 1, C3 = 2, and there is an upper bound of P U = 2.5 on the target quality. There is an upper bound of Y = 100 on the final product.

3 Valid inequalities

In this section, we extract an appropriate subset of the constraints of the formulation presented in Section 2 and derive a number of strong valid inequalities for this relaxation. To that end, we focus on a single output stream and a single attribute and define the sets

+ U U I = i I C P 0 , and I− = i I C P < 0 , { ∈ | i − ≥ } { ∈ | i − } where we have dropped the irrelevant indices on the parameters. Next, we define α = C P U i I, i | i − | ∀ ∈ substitute into equation (1) and use the fact that yl = i I wil, to extract the set ∈ 2 I L I L P X = (w, q, v) R | || | 0, 1 | || | αiwil αiwil 0; ∈ × { } | − ≤ l L i I+ l L i I n X∈ X∈ X∈ X∈ − q 1 l L; w Y q , q v , i I, l L , il ≤ ∀ ∈ il ≤ l il il ≤ il ∀ ∈ ∀ ∈ i I X∈ o which is a relaxation of the set of feasible solutions to the pooling problem presented in Section 2. The set X is composed of a single-node flow constraint, upper bounds on the flow variables based on the proportions q (coming from the McCormick inequalities), variable upper bounds on the proportion variables q, and the definition equations on the proportion variables q.

3.1 Quality inequalities We define the following two classes of inequalities i I+, i I , l L: ∀ ∈ ∀ ∗ ∈ − ∀ ∈ α1 α1 α1 α1 wil 1 Yl(vi0l qi0l) Yl(vil qil) αi0 wi0l0 0 (3) αi∗ − αi∗ − − − αi − − αiαi∗ ≤ i I−(α )   l0=l i0 I− 0∈ X> i∗ X6 X∈ α w (α α )w α Y (v q ) α w 0 (4) i il − i0 − i∗ i0l − i∗ l il − il − i0 i0l0 ≤ i I−(α ) l0=l i0 I− 0∈ X> i∗ X6 X∈ where α1 = maxi I αi and I>−(αi ) = i I− αi > αi . 0∈ − { 0 } ∗ { ∈ | ∗ } + Proposition 3.1. Inequality (3) is valid for X i I , i∗ I−, l L. ∀ ∈ ∀ ∈ ∀ ∈

4 Proof. Case v = 0: (3) becomes α1 1 Y (v q ) + α1 α w 0. il i I−(α ) α l i0l i0l αiα l =l i I i0 i0l0 0∈ > i∗ i∗ − − i∗ 06 0∈ − ≥ v = 0 Case i I−(α ) i0l : it reduces to   0∈ > i∗ P P P P α w α Y (1 q ) + α w = α Y ( q + q ) + α w i il ≤ i∗ l − il i0 i0l0 i∗ l i0l i0l i0 i0l0 l =l i I i I i I+:i =i l =l i I X06 0X∈ − 0X∈ − 0∈X06 X06 0X∈ − α w α Y q + α w , that is always valid. i il ≤ i∗ l i0l i0 i0l0 i I I−(α ) l0=l i0 I− 0∈ −X\ > i∗ X6 X∈ v > 0 Case i I−(α ) i0l : 0∈ > i∗ P α1 α1 α1 α1 wil 1 Yl(1 qi0l) Yl(1 qil) + αi0 wi0l0 αi∗ − αi∗ − − ≤ αi − αiαi∗   i I−(α ) l0=l i0 I− 0∈ X> i∗ X6 X∈ α1 α1 α1 α1 wil 1 Ylqil Yl qi0l + αi0 wi0l0 αi∗ − αi∗ − ≤ αi αiαi∗   i I l0=l i I 0X∈ − X6 0X∈ − α1 α1 Ylqil Yl qi0l + αi0 wi0l0 ≤ αi αiαi i I ∗ l =l i I 0X∈ − X06 0X∈ − α1 αiYlqil α1Yl qi0l αi0 wi0l0 0, always valid because weaker than − − αi ≤ i I ∗ l =l i I 0X∈ − X06 0X∈ − α w α w α w 0. i0 i0l − i0 i0l − i0 i0l0 ≤ i I+ i I l =l i I 0X∈ 0X∈ − X06 0X∈ −

+ Proposition 3.2. Inequality (4) is valid for X i I , i∗ I−, l L. ∀ ∈ ∀ ∈ ∀ ∈ v = 0 (α α )w + α w 0 Proof. Case il : it reduces to i I−(α ) i0 i∗ i0l l =l i I i0 i0l0 . 0∈ > i∗ − 06 0∈ − ≥ Otherwise P P P α w α w (α α )w + α Y (1 q ) i il − i0 i0l0 ≤ i0 − i∗ i0l i∗ l − il l0=l i0 I− i I−(α ) X6 X∈ 0∈ X> i∗ α w α w (α w + α (Y q w )) + α Y q i il − i0 i0l0 ≤ i0 i0l i∗ l i0l − i0l i∗ l i0l l0=l i0 I− i I−(α ) i I I−(α ) X6 X∈ 0∈ X> i∗ 0∈ −X\ > i∗ α w α w α w i il − i0 i0l0 ≤ i0 i0l l =l i I i I X06 0X∈ − 0X∈ −

In order to make the definition of inequalities (3) and (4) more clear, consider the example of Section 2.2. For the example, α = 0.5, α = 1.5, α = 0.5 and I+ = 1 , I = 2, 3 . The inequalities (3) and 1 2 3 { } − { } (4), defined for indices i = 1, i∗ = 3, respectively, are

3w 200(v q ) 300(v q ) 0 (5) 1 − 2 − 2 − 1 − 1 ≤ 0.5w w 50(v q ) 0. (6) 1 − 2 − 1 − 1 ≤ A final valid inequality limits the flow from inputs that exceed the target quality.

5 Proposition 3.3. The following valid inequality is valid for X i I+, l L: ∀ ∈ ∀ ∈ α w Y α (v q ) v Y α 0. (7) i il − l 1 il − il − il l0 1 ≤ l =l X06 Proof. We consider two cases. Case v = 0: inequality (7) reduces to 0 0. Otherwise, using the inequal- il ≤ ity qil 1 i I qi0l we see that (7) is weaker than αiwil Ylα1 i I qi0l l =l Yl0 α1 0 which ≤ − 0∈ − − 0∈ − − 06 ≤ is valid because it is weaker than the valid inequality l L i I+ αi0 wi0l0 l L i I αi0 wi0l0 . P 0∈ 0∈ P ≤ 0∈P 0∈ − Let us consider again the example of Section 2.2.P InequalityP (7) for i = 1 isP P

0.5w 150(v q ) 0 0. (8) 1 − 1 − 1 − ≤ None of the three classes of inequalities introduced dominates the other, as in general, they can cut off different fractional solutions. Specifically, for the example problem,

consider the solution q = (0.1; 0.9; 0), y = 100, w = (10; 90; 0), v = (0.15; 0.9; 0). The inequali- • 0 0 0 ties reduce to: 30 0 15 0; 5 90 2.5 0 0; 5 7.5 0 and only the first inequality − − ≤ − − − ≤ − ≤ cuts off this infeasible solution. Consider the solution q = (0.5; 0.1; 0.4), y = 100, w = (50; 10; 40), v = (0.7; 0.9; 0.4). The • 0 0 0 inequalities reduce to: 150 160 60 0; 25 10 10 0; 25 30 0 and only the second − − ≤ − − ≤ − ≤ one excludes the fractional solution. Finally, consider the solution q = (0.5; 0.5; 0), y = 100, w = (50; 50; 0), v = (0.6; 1; 0). The • 0 0 0 inequalities reduce to: 150 100 30 0; 25 50 5 0; 25 15 0 and only the last − − ≤ − − ≤ − ≤ inequality cuts off the fractional solution.

3.2 Pooling flow cover inequalities In this section, we focus on the case of a single pool, and hence for notational convenience drop the index l from the discussion. The primary result of the section is the generalization of a flow cover inequality:

+ + Proposition 3.4. Let C I with λ = i C+ αiY > 0, S− I−, and define uC∗ + = maxi C+ αiY . ⊆ ∈ ⊆ ∈ The following pooling flow cover inequality is valid for X: P

α w [u∗ + (v q )] α w 0. (9) i i − C i − i − i i ≤ i C+ i S i I S X∈ X∈ − ∈ X−\ − Proof. Let (w, q, v) X and define the set T = i v = 1 . If S T = 0, inequality (9) reduces to ∈ { | i } | − ∩ | i C+ αiwi i I S αiwi 0 which is valid because wi = 0 for all i S−. Now suppose S− T ∈ − ∈ −\ − ≤ ∈ | ∩ | ≥ 1. Since i I S αiwi 0 because wi 0 i I, it is sufficient to prove that i C+ αiwi P − P∈ −\ − ≤ ≥ ∀ ∈ ∈ − i S [uC∗ + (vi qi)] 0. First, observe that ∈ − P − ≤ P P 1 αiwi αiwi [uC∗ + (vi qi)] (vi qi) qi (vi qi). u∗ +  − −  ≤ αiY − − ≤ − − C i C+ i S i C+ i S i C+ i S X∈ X∈ − X∈ X∈ − X∈ X∈ −   Thus, we will be done if we prove that i C+ qi i S (vi qi) 0, which follows from ∈ − ∈ − − ≤

q (1 Pq ) = Pq + q S− T 0 i − − i i i − | ∩ | ≤ i C+ i S T i C+ i S T X∈ ∈X−∩ X∈ ∈X−∩ 6 since S T 1. | − ∩ | ≥

The next simple proposition shows that the inequalities (9) are stronger than the generalized flow cover inequalities (see [15]): α w λv α w 0. (10) i i − i − i i ≤ i C+ i S i I S X∈ X∈ − ∈ X−\ − + + Proposition 3.5. Inequalities (10) are implied by (9), for every C I such that λ = i C+ αiY > 0 ⊆ ∈ and S I . − ⊆ − P

Proof. By definition, uC∗ + λ, and hence i S uC∗ + (vi qi) i S uC∗ + vi i S λvi. ≤ − ∈ − − ≥ − ∈ − ≥ − ∈ − Now let us consider the multiple-pool, oneP output stream and oneP attribute case. (Thus,P we add index l L back to the variables.) ∈ + + l Proposition 3.6. For each pool l L, subset of inputs S− I−, and cover C I , define u∗ + = ∈ ⊆ ⊆ C maxi C+ αiYl. The following inequalities are valid for X: ∈

α w [u∗l (v q )] α w α w 0 (11) i il − C+ il − il − i il − i il0 ≤ i C+ i S i I S i I l L l X∈ X∈ − ∈ X−\ − X∈ − 0∈X\{ } Proof. The proof of 3.4 is valid also in this case.

3.2.1 The separation problem

Given a fractional solution (w , q , v ) X we want to find C+ I+, S I and pool l L such that ∗ ∗ ∗ 6∈ ⊆ − ⊆ − ∈ (11) is violated. Let the maximum violation of such constraints be

z = max αiwil∗ min(uC∗ + (vil∗ qil∗ ), αiwil∗ ) αiwil∗ . SEP C+ I+,l L − − − 0 ⊆ ∈ i C+ i I i I l L l X∈ X∈ − X∈ − 0∈X\{ } If z > 0, inequality (11) is violated for (C+,S , l) with S = i I u (v q ) < α w . Note SEP − − { ∈ −| C∗ + il∗ − il∗ i il∗ } that the solution with the maximum violation has the following nice property: if i C+, then i C+ ∈ 0 ∈ i0 such that αi0 αi. (This follows since, by the definition of uC∗ + , the if αi0 αi, the inequality can ∀ ≤ + ≤ only be made more violated by including i0 in C .) Thus, the separation problem may be solved exactly in polynomial time by considering all the α (i I+) in non-increasing order over all the pools. Algorithm 1 i ∈ gives pseudocode for the separation algorithm.

4 Computational results

The utility of the quality and pooling flow cover inequalities for solving instances of our version of the gener- alized pooling problem was tested using a cut-and-branch approach. Models were created using the GAMS modeling language both for the original problem and for the continuous linear relaxation of the problem, wherein the nonlinear constraints wilj = qilylj are replaced by their McCormick envelopes (as described in Section 2.1) and the constraints v 0, 1 are replaced with 0 v 1. The continuous relaxation il ∈ { } ≤ il ≤

7 Algorithm 1 Algorithm for solving the separation problem. 1: set σˆ = , cˆ = , ˆl = ; −∞ −∞ −∞ + 2: order αi such that α1 α2 α I+ (i I ); ≤ ≤ · · · ≤ | | ∈ 3: for c = 1,..., I+ do | | 4: for l L do ∈ c 5: σ = i=1 αiwil∗ i I min(αcYl(vil∗ qil∗ ), αiwil∗ ) i I l L l αiwil∗ ; − ∈ − − − ∈ − 0∈ \{ } 0 6: if σ > σˆ then P P P P 7: set σˆ = σ, cˆ = c and ˆl = l; 8: end if 9: end for 10: end for 11: if σˆ > 0 then + + ˆ 12: add cut for (C ,S−, l) with C = 1,..., cˆ , l = l and S− = i I− u∗ + (v∗ q∗ ) < α w∗ ; { } { ∈ | C il − il i il} 13: end if is iteratively solved, where at each iteration, if the solution is fractional and the separation problems find violated inequalities, they are added to the linear relaxation, and the relaxation is resolved. The process re- peats until the fractional solution can no longer be separated, after which all inequalities generated are added to the MINLP model. The augmented MINLP model is solved with BARON [13] using a CPU time limit of 2 hours. We compare against the performance of BARON on the model without adding the separated quality and pooling flow cover inequalities. Computational results were obtained on a heterogeneous cluster of computers. For each instance, the model was solved on the same machine both with and without cuts, so while the CPU times cannot be compared between instances, the relative times between the performance of BARON with and without cuts are comparable. Our first test suite consisted of 11 instances from the literature, collected by Sahinidis and Tawarmalani [13]. For these instances, a fixed cost of 500 was added to every input arc. Some of these instances contain bypass arcs, or arcs that connect input streams directly to outputs are present. Bypass arcs are treated as additional pools with only one input in the inequalities. Results of the experiment are given in Table 1, where we report for each instance, the value of the global optimum, the number of nodes and the CPU time needed for BARON to find the optimum both with and without the addition of the violated quality and pooling flow cover inequalities, and the number of inequalities found. In general, the instances from the literature were far too small to draw any meaningful conclusions. The number of nodes is fewer in 4 of 11 cases, in 6 cases it is the same, and in one case, more nodes are taken after adding the inequalities. A second test set consisted of 90 randomly generated instances of various sizes. The random instances were parameterized by the average graph density β, the number of inputs I , the number of pools L , the | | | | number of outputs J , and the number of attributes K , and named β I L J K based on their | | | | − | | − | | − | | − | | size. For each combination of tested parameters, 10 instances were generated, and we report average perfor- mance results over all instances in the family. All the instances are available at the webpage http://www. or.deis.unibo.it/research pages/ORinstances/ORinstances.htm. Results of this ex- periment are summarized in Table 2. Without adding inequalities, BARON is able to solve 76 of the 90 instances. Adding the inequalities, BARON is able to solve three additional instances of the 90. Averages in Table 2 are taken only over the instances that both methods can solve. Complete results detailed the computational performance on specific instances are given on the web site http://www.or.deis.unibo.it/research pages/

8 Table 1: BARON Performance With and Without Cutting Planes no cuts with cuts GO # nodes CPU time # cuts # nodes CPU time adhya1 630.0 7 0.14 20 7 0.14 adhya2 630.0 1 0.07 15 5 0.11 adhya3 978.0 17 0.43 8 1 0.12 adhya4 529.2 17 0.84 12 7 0.54 bental4 500.0 9 0.02 4 1 0.02 bental5 -2000.0 1 0.12 0 1 0.13 foulds2 266.7 1 0.04 6 1 0.02 haverly1 500.0 9 0.01 4 1 0.01 haverly2 400.0 9 0.01 4 9 0.01 haverly3 100.0 1 0.01 4 1 0.01 rt2 -1672.6 356 1.02 0 356 1.05

Table 2: Average Performance of BARON With and Without Cutting Planes

No Cuts With Cuts Instance Family # Solved Avg Nodes Avg Time Avg # Cuts Avg Nodes Avg Time 20-15-10-10-1 9 1052 14.9 11.1 383 7.3 20-15-10-10-2 10 7850 638.2 15.3 3338 440.3 20-15-10-10-4 10 2637 109.5 11.9 2241 168.5 30-15-5-10-1 9 22009 520.5 12.8 13095 367.4 30-15-5-10-2 8 7384 406.8 19.6 3988 239.0 30-15-5-10-4 10 6041 361.1 27.0 1884 109.9 30-15-8-10-1 3 21971 1689.6 11.7 6504 478.3 30-15-8-10-2 9 15663 823.9 19.7 4303 337.6 30-15-8-10-4 8 30424 1035.3 22.0 5472 457.2

9 ORinstances/ORinstances.htm. For the 76 solved instances, the total CPU time required to solve the instances reduces from 39927 seconds to 20602 seconds when the cuts are added to the model before calling BARON. The total number of nodes required to solve the 76 instances reduces from 882173 to 329851 by adding cuts. The most significant performance improvement seems to occur on the instances that have 30% network density and 8 pools (in the 30-15-8-10-x family), indicating that larger, denser instances may benefit most from the additional of the quality and pooling flow cover inequalities. A performance profile of the CPU time of the 79 instances solvable by either method is given in Figure 2. In the curves shown in Figure 2, the point (X, p) is on the graph for the associated method if a fraction p of the 79 instances solved by either method were solved within a factor X of the best of the two methods. For a more complete description of performance profiles, the reader is referred to [3]. From the results of the second experiment, it is clear that the addition of the inequalities has a significant beneficial impact on computational performance.

Figure 2: Performance Profile of Solution Time 1

0.8

p

0.6

0.4

0.2

With Cuts Without Cuts 0 5 10 15 20 25 X

Acknowledgement

The authors would like to thank Ahmet Keha for bringing reference [11] to their attention. This work benefitted from numerous insightful discussions with Andrew Miller.

10 References

[1] M. Bagajewicz. A review of recent design procedures for water networks in refineries and process plants. Computers & Chemical Engineering, 24:2093–2113, 2000.

[2] C.W. DeWitt, L.S. Lasdon, A.D. Waren, D.A. Brenner, and S. Melham. OMEGA: An improved gasoline blending system for Texaco. Interfaces, 19:85–101, 1989.

[3] Elizabeth Dolan and Jorge More.´ Benchmarking optimization software with performance profiles. Mathematical Programming, 91:201–213, 2002.

[4] J. Kallrath. Mixed integer optimization in the chemical process industry: Experience, potential and future perspectives. Chemical Engineering Research and Design, 78:809–822, 2000.

[5] R. Karuppiah and I.E. Grossmann. Global optimization for the synthesis of integrated water systems in chemical processes. Computers & Chemical Engineering, 30:650–673, 2006.

[6] H.M. Lee, J.M. Pinto, I.E. Grossmann, and S. Park. Mixed-integer linear programming model for refinery short-term scheduling of crude oil unloading with inventory management. Industrial & Engi- neering Chemistry Research, 35:1630–1641, 1996.

[7] G.P. McCormick. Computability of global solutions to factorable nonconvex programs: Part 1 - convex underestimating problems. Mathematical Programming, 10:147–175, 1976.

[8] C.A. Meyer and C.A. Floudas. Global optimization of a combinatorially complex generalized pooling problem. AIChE Journal, 52:1027–1037, 2006.

[9] R. Misener and C.A. Floudas. Advances for the pooling problem: Modeling, global optimization, & computational studies. Applied and Computational Mathematics, 8:3–22, 2009.

[10] M. Padberg, T. J. Van Roy, and L. Wolsey. Valid linear inequalities for fixed charge problems. Opera- tions Research, 33:842–861, 1985.

[11] D. J. Papageorgiou, A. Toriello, G. L. Nemhauser, and M.W.P. Savelsbergh. Fixed-charge transporta- tion with product blending. Unpublished manuscript.

[12] B. Rigby, L.S. Lasdon, and A.D. Waren. The evolution of Texaco’s blending systems: From OMEGA to StarBlend. Interfaces, 25:64–83, 1995.

[13] N.V. Sahinidis and M. Tawarmalani. Accelerating branch-and-bound through a modeling language construct for relaxation-specific constraints. Journal of Global Optimization, 32:259–280, 2005.

[14] N. Shah. Mathematical programming techniques for crude oil scheduling. Computers & Chemical Engineering, 20:S1227–S1232, 1996.

[15] L.A. Wolsey and G.L. Nemhauser. Integer and Combinatorial Optimization. Wiley, New York, NY, USA, 1988. AN ALGORITHMIC FRAMEWORK FOR MINLP WITH SEPARABLE NON-CONVEXITY

CLAUDIA D’AMBROSIO∗, JON LEE†, AND ANDREAS WACHTER¨ ‡

Abstract. We present an algorithm for Mixed-Integer Nonlinear Programming (MINLP) problems in which the non-convexity in the objective and constraint functions is manifested as the sum of non-convex univariate functions. We employ a lower bounding convex MINLP relaxation obtained by approximating each non-convex function with a piecewise-convex underestimator that is repeatedly refined. The algorithm is implemented at the level of a modeling language. Favorable numerical results are presented.

Key words. mixed-integer nonlinear programming, global optimization, spatial branch-and-bound, separable, non-convex

AMS(MOS) subject classifications. 65K05, 90C11, 90C26, 90C30

1. Introduction. The global solution of practical instances of Mixed-Integer NonLinear Pro- gramming (MINLP) problems has been considered for some decades. Over a considerable period of time, technology for the global optimization of convex MINLP (i.e., the continuous relaxation of the problem is a convex program) had matured (see, for example, [8, 17, 9, 3]), and rather recently there has been considerable success in the realm of global optimization of non-convex MINLP (see, for example, [18, 16, 13, 2]). Global optimization algorithms, e.g., spatial branch-and-bound approaches like those imple- mented in codes like BARON [18] and COUENNE [2], have had substantial success in tackling com- plicated, but generally small scale, non-convex MINLPs (i.e., mixed-integer nonlinear programs having non-convex continuous relaxations). Because they are aimed at a rather general class of problems, the possibility remains that larger instances from a simpler class may be amenable to a simpler approach. We focus on MINLPs for which the non-convexity in the objective and constraint functions is manifested as the sum of non-convex univariate functions. There are many problems that are already in such a form, or can be brought into such a form via some simple substitutions. In fact, the first step in spatial branch-and-bound is to bring problems into nearly such a form. For our purposes, we assume that the model already has this form. We have developed a simple algorithm, implemented at the level of a modeling language (in our case AMPL, see [10]), to attack such separable problems. First, we identify subintervals of convexity and concavity for the univariate functions using external calls to MATLAB [14]. With such an identification at hand, we develop a convex MINLP relaxation of the problem (i.e., as a mixed-integer nonlinear programs having a convex continuous relaxations). Our convex MINLP relaxation differs from those typically employed in spatial branch-and-bound; rather than relaxing the graph of a univariate function on an interval to an enclosing polygon, we work on each subinterval of convexity and concavity separately, using linear relaxation on only the “concave side” of each function on the subintervals. The subintervals are glued together using binary variables. Next, we employ ideas of spatial branch-and-bound, but rather than branching, we repeatedly refine our convex MINLP relaxation by modifying it at the modeling level. We attack our convex MINLP relaxation, to get lower bounds on the global minimum, using the code BONMIN [3, 4] as a black-box convex MINLP solver. Next, by fixing the integer variables in the original non-convex MINLP, and then locally solving the associated non-convex NLP restriction, we get an upper bound on the global minimum, using the code IPOPT [19]. We use the solutions found by BONMIN and IPOPT to guide our choice of further refinements. We implemented our framework using the modeling language AMPL. In order to obtain all of the information necessary for the execution of the algorithm, external software, specifically the

∗Dept. of ECSS, University of Bologna, Italy. Email: [email protected] †IBM T.J. Watson Research Center, NY, U.S.A. Email: [email protected] ‡IBM T.J. Watson Research Center, NY, U.S.A. Email: [email protected] 1 2 CLAUDIA D’AMBROSIO, JON LEE & ANDREAS WACHTER¨ tool for high-level computational analysis MATLAB, the convex MINLP solver BONMIN, and the NLP solver IPOPT, are called directly from the AMPL environment. A detailed description of the entire

algorithmic framework, together with a proof of its convergence, is provided in §2.

We present computational results in §3. Some of the instances arise from specific applications; in particular, Uncapacitated Facility Location problems, Hydro Unit Commitment and Scheduling problems, and Nonlinear Continuous Knapsack problems. We also present computational results on selected instances of GLOBALLib and MINLPLib. We have had significant success in our preliminary computational experiments. In particular, we see very few major iterations occurring, with most of the time being spent in the solution of a small number of convex MINLPs. As we had hoped, our method does particularly well on problems for which the non-convexity is naturally separable. An advantage of our approach is that it can be implemented easily using existing software components and that further advances in technology for convex MINLP will immediately give us a proportional benefit. Finally, we note that a preliminary shorter version of the present paper appeared as [7]. 2. Our algorithmic framework. We focus now on MINLPs, where the non-convexity in the objective and constraint functions is manifested as the sum of non-convex univariate functions. Without loss of generality, we take them to be of the form

min j N Cj xj subject∈ to P f(x) 0 ; ≤ ( ) ri(x)+ gik(xk) 0 , i M ; P k H(i) ≤ ∀ ∈ L x U∈ , j N ; j jP j x integer,≤ ≤ j ∀I ,∈ j ∀ ∈ n p n where N := 1, 2,...,n , f : R R and ri : R R i M , are convex functions, H(i) N { } → → ∀ ∈ ⊆ i M , the gik : R R are non-convex univariate function i M, and I N. Letting ∀ ∈ → ∀ ∈ ⊆ H := i M H(i), we can take each Lj and Uj to be finite or infinite for j N H , but for j H we assume∪ ∈ that these are finite bounds. ∈ \ ∈ We assume that the problem functions are sufficiently smooth (e.g., twice continuously dif- ferentiable) with the exception that we allow the univariate gik to be continuous functions defined piecewise by sufficiently smooth functions over a finite set of subintervals of [Lk,Uk]. Without loss of generality, we have taken the objective function as linear and all of the constraints to be inequalities, and further of the less-then-or-equal variety. Linear equality constraints could be included directly in this formulation, while we assume that nonlinear equalities have been split into two inequality constraints. Our approach is an iterative technique based on three fundamental ingredients: A reformulation method with which we obtain a convex MINLP relaxation of the orig- • inal problem . Solving the convex MINLP relaxation , we obtain a lowerQ bound of our original problemP ; Q A non-convex NLPP restriction of the original MINLP problem obtained by fixing the • R P variables within the set xj : j I . Locally solving the non-convex NLP restriction , we obtain an upper bound{ of our∈ original} problem ; R A refinement technique aimed at improving, at eachP iteration, the quality of the lower • bound obtained by solving the convex MINLP relaxation . The main idea of our algorithmic framework is to iterativelyQsolve a lower-bounding relax- ation and an upper-bounding restriction so that, in case the value of the upper and the lowerQ bound are the same, the global optimalityR of the solution found is proven; otherwise we make a refinement to the lower-bounding relaxation . At each iteration, we seek to decrease the gap between the lower and the upper bound, andQ hopefully, before too long, the gap will be within a tolerance value, or the lower bounding solution is deemed to be sufficiently feasible for the original problem. In these cases, or in the case a time/iteration limit is reached, the algorithm stops. If the gap is closed, we have found a global optimum, otherwise we have a heuristic solu- tion (provided that the upper bound is not + ). The lower-bounding relaxation is a convex ∞ Q AN ALGORITHMIC FRAMEWORK FOR MINLP WITH SEPARABLE NON-CONVEXITY 3 relaxation of the original non-convex MINLP problem, obtained by approximating the concave part of the non-convex univariate functions using piecewise linear approximation. The novelty in this part of the algorithmic framework is the new formulation of the convex relaxation: The function is approximated only where it is concave, while the convex parts of the functions are

not approximated, but taken as they are. The convex relaxation proposed is described in details § in §2.1. The upper-bounding restriction , described in 2.2, is obtained simply by fixing the variables with integrality constraints. TheR refinement technique consists of adding one or more breakpoints where needed, i.e., where the approximation of the non-convex function is bad and

the solution of the lower-bounding problem lies. Refinement strategies are described in §2.3, and once the ingredients of the algorithmic framework are described in detail, we give a pseudo-code

description of our algorithmic framework (see §2.4). Here, we also discuss some considerations

about the general framework and the similarities and differences with popular global optimization § methods. Theoretical convergence guarantees are discussed in §2.5. In 3, computational experi- ments are presented, detailing the performance of the algorithm and comparing the approach to other methods.

2.1. The lower-bounding convex MINLP relaxation . To obtain our convex MINLP relaxation of the MINLP problem , we need to locate the subintervalsQ of the domain of each Q P univariate function gi for which the function is uniformly convex or concave. For simplicity of notation, rather than refer to the constraint r (x)+ g (x ) 0, we consider a term of i k H(i) ik k ≤ the form g(x ) := g (x ), where g : is a univariate∈ non-convex function of x , for some k k ik k R R P k (1 k n). → ≤ ≤ We want to explicitly view each such g as a piecewise-defined function, where on each piece the function is either convex or concave. This feature also allows us to handle functions that are already piecewise defined by the modeler. In practice, for each non-convex function g , we compute the points at which the convexity/concavity may change, i.e., the zeros of the second derivative of g , using MATLAB. In case a function g is naturally piecewise defined, we are essentially refining the piecewise definition of it in such a way that the convexity/concavity is uniform on each piece. Example 1. Consider the piecewise-defined univariate function

1 + (x 1)3 , for 0 x 2 ; g(x ) := k k k 1 + (x − 3)2 , for 2 ≤ x ≤ 4 ,  k − ≤ k ≤ depicted in Fig. 1. In addition to the breakpoints xk = 0, 2, 4 of the definition of g, the convex- ity/concavity changes at xk = 1, so by utilizing an additional breakpoint at xk = 1 the convex- ity/concavity is now uniform on each piece.

2.5 2.5 2.5

2 2 2

1.5 1.5 1.5

1 1 1

0.5 0.5 0.5

0 0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

Fig. 1. A piecewise-defined Fig. 2. A piecewise-convex Fig. 3. Improved piecewise- univariate function lower approximation convex lower approximation

Now, on each concave piece we can use a secant approximation to give a piecewise-convex lower approximation of g .

Example 1, continued. Relative to g(xk) of Example 1, we have the piecewise-convex 4 CLAUDIA D’AMBROSIO, JON LEE & ANDREAS WACHTER¨ lower approximation

xk , for 0 xk 1 ; g(x ) := 1 + (x 1)3 , for 1 ≤ x ≤ 2 ; k  k k 1 + (x − 3)2 , for 2 ≤ x ≤ 4 ,  k − ≤ k ≤ depicted in Fig. 2.  We can obtain a better lower bound by refining the piecewise-linear lower approximation on the concave pieces. We let L =: P < P < < P := U (2.1) k 0 1 · · · p k be the ordered breakpoints at which the convexity/concavity of g changes, including, in the case of piecewise definition of g, the points at which the definition of g changes. We define:

[Pp 1, Pp] := the p-th subinterval of the domain of g (p 1 . . . p ); − ∈{ } Hˇ := the set of indices of subintervals on which g is convex; Hˆ := the set of indices of subintervals on which g is concave.

On the concave intervals, we will allow further breakpoints. We let Bp be the ordered set of breakpoints for the concave interval indexed by p Hˆ . We denote these breakpoints as ∈ Pp 1 =: Xp,1

Pp Pp 1 , if 1 p p∗ 1 ; − − ≤ ≤ − δp = x∗ Pp 1 , if p = p∗ ;  k − − 0 , otherwise.

Constraints (2.6–2.8) ensure that, for each concave interval, the convex combination of the break- points is correctly computed. Finally, (2.2) approximates the original non-convex univariate func- tion g(xk) . Each single term of the first and the second summations, using the definition of δp , reduces, respectively, to

g(P ) , if p 1,...,p∗ 1 ; p ∈{ − } g(Pp 1 + δp)= g(x∗) , if p = p∗ ; −  k g(Pp 1) , if p p∗ +1,..., p , − ∈{ } and 

g(P ) , if p 1,...,p∗ 1 ; p ∈{ − } g(Xp,b) αp,b = b B g(Xp∗,b) αp∗,b , if p = p∗ ;  ∈ p∗ b Bp  X∈ g(Pp 1) , if p p∗ +1,..., p , P − ∈{ } reducing expression (2.2) to 

p∗ 1 p p 1 − g(Pp)+ γ + g(Pp 1) − g(Pp)= γ , p=1 p=p∗+1 − − p=1 where P P P ˇ g(xk∗) , if p∗ H ; γ = ∈ ˆ ( b B g(Xp∗,b) αp∗b , if p∗ H . ∈ p∗ ∈ P So, if xk∗ is in a subinterval on which g is convex, then the approximation (2.2) is exact; while if xk∗ is in a subinterval on which g is concave, then the approximation is a piecewise-linear underestimation of g. Constraints (2.8) define Hˆ Special Ordered Sets of Type 2 (SOS2), i.e., ordered sets of positive variables among which| | at most 2 can assume a non-zero value, and, in this case, they must be consecutive (see Beale and Tomlin [1]). Unfortunately, at the moment, convex MINLP solvers do not typically handle SOS2 like most MILP solvers do (also defining special-purpose branching strategies). For this reason, we substitute constraints (2.8), p Hˆ , with new binary variables y , with b 1,..., B 1 , and constraints: ∀ ∈ p,b ∈{ | p|− } αp,b yp,b 1 + yp,b b Bp ; (2.8.a) ≤ − ∀ ∈ B 1 | p|− b=1 yp,b =1 , (2.8.b) with dummy values yp,0 = yp, Bp = 0. InP the future, when convex MINLP solvers will handle the definition of SOS2, variables y| and| constraints (2.8.a–b) would be not necessary. It is important to note that if we utilized a very large number of breakpoints at the start, solving the resulting convex MINLP would mean essentially solving globally the original MINLP up to some pre-determined toleranceQ related to the density of the breakpoints. But of course such Pa convex MINLP would be too hard to be solved in practice. With our algorithmic framework, we dynamically seekQ a significantly smaller convex MINLP , thus generally more easily solvable, which we can use to guide the non-convex NLP restrictionQ to a good local solution, eventually settling on and proving global optimality of such a solutionRto the original MINLP . P 6 CLAUDIA D’AMBROSIO, JON LEE & ANDREAS WACHTER¨

2.2. The upper-bounding non-convex NLP restriction . Given a solution x of the convex MINLP relaxation , the upper-bounding restriction is definedR as the non-convex NLP: Q R min j N Cj xj subject∈ to P f(x) 0 ; ≤ ( ) ri(x)+ gik(xk) 0 , i M ; R k H(i) ≤ ∀ ∈ L x U∈ , j N ; j jP j x =≤ x ,≤ j I∀ . ∈ j j ∀ ∈ A solution of this non-convex NLP is a heuristic solution of the non-convex MINLP problem for two reasons: (i) the integer variablesR x , j I , might not be fixed to globally optimal values; P j ∈ (ii) the NLP is non-convex, and so even if the integer variables xj , j I , are fixed to globally optimal values,R the NLP solver may only find a local optimum of the non-convex∈ NLP or even fail to find a feasible point. This consideration emphasizes the importance of the lower-boundingR relaxation for the guarantee of global optimality. The upper-bounding problem resolution could be seen asQ a “verification phase” in which a solution of the convex MINLP relaxation is tested to be really feasible for the non-convex MINLP . To emphasis this, the NLP solverQ for is given the solution of the convex MINLP relaxationP as starting point. R 2.3. The refinement technique. At the end of each iteration, we have two solutions: x , the solution of the lower-bounding convex MINLP relaxation , and x , the solution of the upper-bounding non-convex NLP restriction ; in case we cannotQ find a solution of , e.g., if R R is infeasible, then no x is available. If j N Cj xj = j N Cj xj within a certain tolerance, or Rif x is sufficiently feasible for the original constraints,∈ we ret∈ urn to the user as solution the point x or x, respectively. Otherwise, in orderP to continue, weP want to refine the approximation of the lower-bounding convex MINLP relaxation by adding further breakpoints. We employed two strategies: Q Based on the lower-bounding problem solution x: For each i M and k H(i), if x lies • ∈ ∈ k in a concave interval of gik , add xk as a breakpoint for the relaxation of gik . This procedure drives the convergence of the overall method since it makes sure that the lower bounding problem becomes eventually a sufficiently accurate approximation of the original problem in the neighborhood of the global solution. Since adding a breakpoint increases the size of the convex MINLP relaxation, in practice we do not add such a new breakpoint if it would be within some small tolerance of an existing breakpoint for gik . Based on the upper-bounding problem solution x: For each i M and k H(i), if x lies • ∈ ∈ k in a concave interval of gik , add xk as a breakpoint for the relaxation of gik . The motivation behind this option is to accelerate the convergence of the method. If the solution found by the upper-bounding problem is indeed the global solution, the relaxation should eventually be exact at this point to prove its optimality. Again, to keep the size of the relaxation MINLP manageable, breakpoints are only added if they are not too close to existing ones. We found that these strategies work well together. Hence, at each major iteration, we add a breakpoint in each concave interval where x lies in order to converge and one where x lies to speed up the convergence. 2.4. The algorithmic framework. Algorithm 1 details our SC-MINLP (Sequential Convex MINLP) Algorithm, while Figure 4 depicts, at a high level, how we designed our implementation of it. For an optimization problem, val( ) refers to its optimal objective value. At each iteration, the lower-bounding· MINLP relaxation and the upper-bounding NLP restriction are redefined: What changes in are the sets of breakpointsQ that refine the piecewise- linear approximationR of concave parts of theQ non-convex functions. At each iteration, the number of breakpoint used increases, and so does the accuracy of the approximation. What may change in are the values of the fixed integer variables xj , j I . Moreover, what changes is the starting pointR given to the NLP solver, derived from an optimal∈ solution of the lower-bounding MINLP relaxation . Q AN ALGORITHMIC FRAMEWORK FOR MINLP WITH SEPARABLE NON-CONVEXITY 7

Algorithm 1 : SC-MINLP (Sequential Convex MINLP) Algorithm

Choose tolerances ε,εfeas > 0; initialize LB := ; UB := + ; i ˆ i ˇ i i i −∞ i ∞ Find Pp , H , H , Xpb ( i M,p 1 . . . p ,b Bp). repeat ∀ ∈ ∈{ } ∈ Solve the convex MINLP relaxation of the original problem to obtain x; if (val( ) > LB) then Q P LB :=Q val( ); Q if (x is feasible for the original problem (within tolerance εfeas)) then return x P end if end if Solve the non-convex NLP restriction of the original problem to obtain x; if (solution x could be computed andR val( ) < UB) then P R UB := val( ); xUB := x end if R if (UB LB >ε) then − i i Update Bp , Xpb ; end if until ((UB LB ε) or (time or iteration limited exceeded)) − ≤ return the current best solution xUB

AMPL

Lower bounding Upper bounding Init relaxation Q restriction R

Refinement

MATLAB MINLP solver NLP solver

Fig. 4. The SC-MINLP (Sequential Convex MINLP) framework

Our algorithmic framework bears comparison with spatial branch-and-bound, a successful technique in global optimization. In particular: during the refining phase, the parts in which the approximation is bad are discovered and • the approximation is improved there, but we do it by adding one or more breakpoints instead of branching on a continuous variable as in spatial branch-and-bound; like spatial branch-and-bound, our approach is a rigorous global-optimization algorithm • rather than a heuristic; unlike spatial branch-and-bound, our approach does not utilize an expression tree; it works • directly on the broad class of separable non-convex MINLPs of the form , and of course problems that can be put in such a form; P unlike standard implementations of spatial branch-and-bound methods, we can directly • keep multivariate convex functions in our relaxation instead of using linear approxima- 8 CLAUDIA D’AMBROSIO, JON LEE & ANDREAS WACHTER¨

tions; unlike spatial branch-and-bound, our method can be effectively implemented at the model- • ing-language level. 2.5. Convergence Analysis. For the theoretical convergence analysis of Algorithm 1, we make the following assumptions, denoting by l the iteration counter for the repeat loop. A1. The functions f(x) and r (x) are continuous, and the univariate functions g in ( ) are i ik P uniformly Lipschitz-continuous with a bounded Lipschitz constant Lg. A2. The set of breakpoints (2.1) are correctly identified. A3. The problem has a feasible point. Hence, for each l, the relaxation l is feasible, and we assume itsP (globally) optimal solution xl is computed. Q A4. The refinement technique described in Section 2.3 adds a breakpoint for every lower- bounding problem solution xl, even if it is very close to an existing breakpoint. A5. The feasibility tolerance εfeas and the optimality gap tolerance ε are both chosen to be zero, and no iteration limit is set. Theorem 2.1. Under assumptions A1-A5, Algorithm 1 either terminates at a global solution of the original problem , or each limit point of the sequence xl ∞ is a global solution of . P { }l=1 P Proof. By construction, l is always a relaxation of , and hence val( l) is always less than or equal to the value val( ) ofQ the objective function at theP global solutionQ to . P P If the algorithm terminates in a finite number of iterations, it either returns an iterate xl that is feasible for (after the feasibility test for ) and therefore a global solution for , or it returns P P P a best upper bounding solution xUB , which as a solution for is feasible for and has the same value as any global solution for (since UB = LB). R P P l In the case that the algorithm generates an infinite sequence x of iterates, let x∗ be a limit ˜l { }l point of this sequence, i.e., there exists a subsequence x ∞ of x ∞ converging to x∗. Asa { }˜l=1 { }l=1 solution of ˜l, each x˜l satisfies the constraints of , except for Q P r (x)+ g (x ) 0 , i M, (2.9) i ik k ≤ ∀ ∈ k H(i) ∈X because the “gik(xk)” terms are replaced by a piecewise convex relaxation, see (2.2). We denote the values of their approximation byg ˜˜l (x ). Then, the solutions of ˜l satisfy ik k Q ˜ ˜ ˜ r (xl)+ g˜l (xl ) 0 , i M. (2.10) i ik k ≤ ∀ ∈ k H(i) ∈X ˜ ˜ ˜ ˜ ˜ l2 Now choose l1, l2 with l2 > l1. Because the approximationg ˜ik(xk) is defined to coincide with the convex parts of gik(xk) and is otherwise a linear interpolation between breakpoints (note that ˜ ˜ ˜ xl1 is a breakpoint for l2 if xl1 is in an interval where g (x ) is concave), the Lipschitz-continuity k Q k ik k of the gik(xk) gives us

˜ ˜ ˜ ˜ ˜ g˜l2 (xl2 ) g (xl1 ) L xl2 xl1 . ik k ≥ ik k − g| k − k | Together with (2.10) we therefore obtain

˜ ˜ ˜ ˜ ˜ ˜ ˜ r (xl2 )+ g (xl1 ) r (xl2 )+ g˜l2 (xl2 )+ L xl2 xl1 i ik k ≤ i ik k g | k − k | k H(i) k H(i) k H(i) ∈X ∈X ∈X ˜ ˜ L xl2 xl1 ≤ g | k − k | k H(i) ∈X for all i M. Because ˜l , ˜l with ˜l > ˜l have been chosen arbitrarily, taking the limit as ∈ 1 2 2 1 ˜l1, ˜l2 and using the continuity of r shows that x∗ satisfies (2.9). The continuity of the → ∞ ˜l remaining constraints in ensure that x∗ is feasible for . Because val( ) val( ) for all l˜, we P P Q ≤ P finally obtain val( ∗) val( ), so that x∗ must be a global solution of . Q ≤ P P AN ALGORITHMIC FRAMEWORK FOR MINLP WITH SEPARABLE NON-CONVEXITY 9

3. Computational results. We implemented our algorithmic framework as an AMPL script, and we used MATLAB as a tool for numerical convexity analysis, BONMIN as our convex MINLP solver, and IPOPT as our NLP solver. We used MATLAB to detect the subintervals of convexity and concavity for the non-convex univariate functions in the model. In particular, MATLAB reads a text file generated by the AMPL script, containing the constraints with univariate non-convex functions, together with the names and bounds of the independent variables. With this information, using the Symbolic Math Tool- box, MATLAB first computes the formula for the second derivative of each univariate non-convex function, and then computes its zeros to split the function into subintervals of convexity and concavity. The zeros are computed in the following manner. The second derivative is evaluated at 100 evenly-spaced points on the interval defined by the variable’s lower and upper bounds. Then, between points where there is a change in the sign of the second derivative, another 100 evenly-spaced points are evaluated. Finally, the precise location of each breakpoint is computed using the MATLAB function “fzero”. For each univariate non-convex function, we use MATLAB to return the number of subintervals, the breakpoints, and associated function values in a text file which is read by the AMPL script. In this section we present computational results for four problem categories. The tests were executed on a single processor of an Intel Core2 CPU 6600, 2.40 GHz with 1.94 GB of RAM, using a time limit of 2 hours per instance. The relative optimality gap and feasibility tolerance used 4 5 for all the experiments is 10− , and we do not add a breakpoint if it would be within 10− of an existing breakpoint. For each set of problems, we describe the non-convex MINLP model . Two tables with computational results exhibit the behavior of our algorithm on some instancesP of each problem class. The first table presents the iterations of our SC-MINLP Algorithm, with the columns labeled as follows: instance: the instance name; • var/int/cons: the total number of variables, the number of integer variables, and the • number of constraints in the convex relaxation ; iter #: the iteration count; Q • LB: the value of the lower bound; • UB: the value of the upper bound; • int change: indicated whether the integer variables in the lower bounding solution x are • different compared to the previous iteration; time MINLP: the CPU time needed to solve the convex MINLP relaxation to optimality • (in seconds); Q # br added: the number of breakpoints added at the end of the previous iteration. The second• table presents comparisons of our SC-MINLP Algorithm with COUENNE and BONMIN. COUENNE is an open-source Branch-and-Bound algorithm aimed at the global solution of MINLP problems [2, 6]. It is an exact method for the problems we address in this paper. BONMIN is an open-source code for solving general MINLP problems [3, 4], but it is an exact method only for convex MINLPs. Here, BONMIN’s nonlinear branch-and-bound option was chosen. When used for solving non-convex MINLPs, the solution returned is not guaranteed to be a global optimum. However, a few heuristic options are available in BONMIN, specifically designed to treat non-convex MINLPs. Here, we use the option that allows solving the root node with a user-specified number of different randomly-chosen starting points, continuing with the best solution found. This heuristic use of BONMIN is in contrast to its use in SC-MINLP, where BONMIN is employed only for the solution of the convex MINLP relaxation . The columns in the second tableQ have the following meaning: instance: the instance name; • var/int/cons: the total number of variables, the number of integer variables, and the • number of constraints; for each approach, in particular SC-MINLP, COUENNE, BONMIN 1, BONMIN 50, we report: • – time (LB): the CPU time (or the value of the lower bound (in parentheses) if the time limit is reached); 10 CLAUDIA D’AMBROSIO, JON LEE & ANDREAS WACHTER¨

– UB: the value of the upper bound. BONMIN 1 and BONMIN 50 both refer to the use of BONMIN, but they differ in the number of multiple solutions of the root node; in the first case, the root node is solved just once, while in the second case, 50 randomly-generated starting points are given to the root-node NLP solver. If BONMIN reached the time limit, we do not report the lower bound because BONMIN cannot determine a valid lower bound for a non-convex problem. 3.1. Uncapacitated Facility Location (UFL) problem. The UFL application is pre- sented in [12]. The set of customers is denoted with T and the set of facilities is denoted with K (wkt is the fraction of demand of customer t satisfied by facility k for each t T, k K). Univariate non-convexity in the model arises due to nonlinear shipment costs. The∈ UFL∈ model formulation is as follows:

min k K Ckyk + t T vt subject∈ to ∈ P P vt k K Sktskt , t T ; ≥− ∈ ∀ ∈ skt gkt(wkt) , t T ; w ≤ yP, t ∀T, k∈ K ; kt ≤ k ∀ ∈ ∈ k K wkt =1 , t T ; ∈ ∀ ∈ wkt 0 , t T, k K ; yP ≥ 0, 1 ∀, ∈k K∈ . k ∈{ } ∀ ∈

Figure 5 depicts the three different nonlinear functions gkt(wkt) that were used for the com- putational results presented in Tables 1 and 2. The dashed line depicts the non-convex function, while the solid line indicates the initial piecewise-convex underestimator. Note that the third func- tion was intentionally designed to be pathological and challenging for SC-MINLP. The remaining problem data was randomly generated.

400 50 45 350 45

40 40 300

35 35 250

30 30 200 25

25 20 150

15 100 20

10

50 15 5

0 0 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 (a) Instance ufl 1 (b) Instance ufl 2 (c) Instance ufl 3

Fig. 5. UFL: Shapes of g (w ) for the three instances. − kt kt

Table 1 Results for Uncapacitated Facility Location problem

iter int time # br instance var/int/cons # LB UB change MINLP added ufl 1 153/39/228 1 4,122.000 4,330.400 - 1.35 - ... 2 4,324.780 4,330.400 no 11.84 11 ... 3 4,327.724 4,330.400 no 19.17 5 ... 4 4,328.993 4,330.400 no 30.75 5 205/65/254 5 4,330.070 4,330.400 no 45.42 5 ufl 2 189/57/264 1 27,516.600 27,516.569 - 4.47 - ufl 3 79/21/101 1 1,947.883 2,756.890 - 2.25 - ... 2 2,064.267 2,756.890 no 2.75 2 87/25/105 3 2,292.743 2,292.777 no 3.06 2

In Table 1 the performance of SC-MINLP is shown. For the first instance, the global optimum is found at the first iteration, but 4 more iteration are needed to prove global optimality. In the second instance, only one iteration is needed. In the third instance, the first feasible solution found is not the global optimum which is found at the third (and last) iteration. Table 2 demonstrates good performance of SC-MINLP. In particular, instance ufl 1 is solved in about 117 seconds compared to AN ALGORITHMIC FRAMEWORK FOR MINLP WITH SEPARABLE NON-CONVEXITY 11

Table 2 Results for Uncapacitated Facility Location problem

SC-MINLP COUENNE BONMIN 1 BONMIN 50 var/int/cons time time instance original (LB) UB (LB) UB time UB time UB ufl 1 45/3/48 116.47 4,330.400 529.49 4,330.400 0.32 4,330.400 369.85 4,330.400 ufl 2 45/3/48 17.83 27,516.569 232.85 27,516.569 0.97 27,516.569 144.06 27,516.569 ufl 3 32/2/36 8.44 2,292.777 0.73 2,292.775 3.08 2,292.777 3.13 2,292.775

530 seconds needed by COUENNE, instance ufl 2 in less than 18 seconds compared to 233 seconds. In instance ufl 3, COUENNE performs better than SC-MINLP, but this instance is really quite easy for both algorithms. BONMIN 1 finds solutions to all three instances very quickly, and these solutions turn out to be globally optimal (but note, however, that BONMIN 1 is a heuristic algorithm and no guarantee of the global optimality is given). BONMIN 50 also finds the three global optima, but in non-negligible time (greater than the one needed by SC-MINLP in 2 out of 3 instances). 3.2. Hydro Unit Commitment and Scheduling problem. The Hydro Unit Commit- ment and Scheduling problem is described in [5]. Univariate non-convexity in the model arises due to the dependence of the power produced by each turbine on the water flow passing through the turbine. The following model was used for the computational results of Tables 3 and 4:

min j J t T ∆t Πt pjt Cj wjt (Dj + ΠtEj )yjt − ∈ ∈ − − subjectP to P   v V = 0 ; e e t − t vt vt 1 3600 ∆t (It j J qjt st)=0 , t T ; − − − − ∈ − ∀ ∈ q (Q− u + Q g ) 0 , j J, t T ; jt − j jt j jt ≥P ∀ ∈ ∈ q (Q− u + Q g ) 0 , j J, t T ; jt − j jt j jt ≤ ∀ ∈ ∈ j J (qjt qj(t 1)) + ∆q− 0 , t T ; ∈ − − + ≥ ∀ ∈ j J (qjt qj(t 1)) ∆q 0 , t T ; P ∈ − − − ≤ ∀ ∈ st j J (Wj wjt + Yj yjt) 0 , t T ; P− ∈ ≥ ∀ ∈ j J qjt + st Θ 0 , t T ; ∈P − ≥ ∀ ∈ gjt gj(t 1) (ewjt wjte)=0 , j J, t T ; P − w −+ w 1−, j −J, t T ; ∀ ∈ ∈ jt jt ≤ ∀ ∈ ∈ ujt uj(t 1) (eyjt yjt)=0 , j J, t T ; y +− y − 1 −, j −J, t T ; ∀ ∈ ∈ ejt jt ≤ ∀ ∈ ∈ gjt + ukt 1 , ej, k J, t T ; u ≤ n ∀1 , ∈t T∈; e j J jt ≤ − ∀ ∈ p ∈ ϕ(q )=0 , j J, t T. Pjt − jt ∀ ∈ ∈ Figure 6 shows the non-convex functions ϕ(qjt) used for the three instances. The remaining problem data was chosen according to [5].

0 0 0 −2 −1 −4 −1 −2 −6 −2 −3 −8 −3 −4 −10 −5 −4 −12 −6

−14 −5 −7 −16 −6 −8 −18

−7 −9 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40 (a) Instance hydro 1 (b) Instance hydro 2 (c) Instance hydro 3

Fig. 6. Hydro UC: Shapes of ϕ(q ) for the three instances − jt Our computational results are reported in Tables 3 and 4. We observe good performance of SC-MINLP. It is able to find the global optimum of the three instances within the time limit, but 12 CLAUDIA D’AMBROSIO, JON LEE & ANDREAS WACHTER¨

Table 3 Results for Hydro Unit Commitment and Scheduling problem

iter int time # br instance var/int/cons # LB UB change MINLP added hydro 1 324/142/445 1 -10,231.039 -10,140.763 - 18.02 - 332/146/449 2 -10,140.760 -10,140.763 no 23.62 4 hydro 2 324/142/445 1 -3,950.697 -3,891.224 - 21.73 - ... 2 -3,950.583 -3,891.224 no 21.34 2 ... 3 -3,950.583 -3,891.224 no 27.86 2 336/148/451 4 -3,932.182 -3,932.182 no 38.20 2 hydro 3 324/142/445 1 -4,753.849 -4,634.409 - 59.33 - ... 2 -4,719.927 -4,660.189 no 96.93 4 336/148/451 3 -4,710.734 -4,710.734 yes 101.57 2

Table 4 Results for Hydro Unit Commitment and Scheduling problem

SC-MINLP COUENNE BONMIN 1 BONMIN 50 var/int/cons time time instance original (LB) UB (LB) UB time UB time UB hydro 1 124/62/165 107.77 -10,140.763 (-11,229.80) -10,140.763 5.03 -10,140.763 5.75 -7,620.435 hydro 2 124/62/165 211.79 -3,932.182 (-12,104.40) -2,910.910 4.63 -3,928.139 7.02 -3,201.780 hydro 3 124/62/165 337.77 -4,710.734 (-12,104.40) -3,703.070 5.12 -4,131.095 13.76 -3,951.199

COUENNE does not solve to global optimality any of the instances. Also, BONMIN 1 and BONMIN 50 show good performance. In particular, often a good solution is found in few seconds, and BONMIN 1 finds the global optimum in one case. 3.3. Nonlinear Continuous Knapsack problem. In this subsection, we present results for the Nonlinear Continuous Knapsack problem. This purely continuous problem can be motivated as a continuous resource-allocation problem. A limited resource of amount C (such as advertising dollars) has to be partitioned for different categories j N (such as advertisements for different products). The objective function is the overall return∈ from all categories. The non-convexity arises because a small amount of allocation provides only a small return, up to a threshold, when the advertisements are noticed by the consumers and result in substantial sales. On the other hand, at some point, saturation sets in, and an increase of advertisement no longer leads to a significant increase in sales. Our model is the non-convex NLP problem:

min j N pj − ∈ subject to P pj gj (xj ) 0 , j N ; − x ≤C ; ∀ ∈ j N j ≤ 0 ∈x U , j N . P≤ j ≤ ∀ ∈

aj (xj +dj ) where gj (xj )= cj /(1 + bj exp− ) .

−1 −5 −2 −2 −5 −10

−3 −4 −15

−4 −10 −20 −6 −5 −25 −15 −6 −8 −30 −7

−20 −10 −35 −8

−40 −9 −12 −25 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100

Fig. 7. Example shapes of g (x ) for instance nck 20 100 − j j

Note that the sigmoid function gj (xj ) is increasing, and it is convex up to the inflection point at x = dj + ln(bj )/aj , whereupon it becomes concave. The− instances were generated randomly. In particular, independently for each j N, a was ∈ j uniformly generated in the interval [0.1, 0.2], bj and cj in the interval [0,U] and dj in the interval [ U, 0]. The name of each instance contains information about the values N and C, namely, nck− N C. | | |Our| computational results are reported in Tables 5 and 6. SC-MINLP finds the global optimum for all the 6 instances in less than 3 minutes. COUENNE is able to close the gap for only 2 instances AN ALGORITHMIC FRAMEWORK FOR MINLP WITH SEPARABLE NON-CONVEXITY 13 within the time limit. BONMIN 1 and BONMIN 50 terminate quickly, but the global optimum is found only for 1 instance for BONMIN 1 and 2 instances for BONMIN 50.

Table 5 Results for Nonlinear Continuous Knapsack problem

iter int time # br instance var/int/cons # LB UB change MINLP added nck 20 100 144/32/205 1 -162.444 -159.444 - 0.49 - 146/33/206 2 -159.444 -159.444 - 0.94 1 nck 20 200 144/32/205 1 -244.015 -238.053 - 0.67 - ... 2 -241.805 -238.053 - 0.83 1 ... 3 -241.348 -238.053 - 1.16 1 ... 4 -240.518 -238.053 - 1.35 1 ... 5 -239.865 -238.053 - 1.56 1 ... 6 -239.744 -238.053 - 1.68 1 156/38/211 7 -239.125 -239.125 - 1.81 1 nck 20 450 144/32/205 1 -391.499 -391.337 - 0.79 - 146/32/206 2 -391.364 -391.337 - 0.87 1 nck 50 400 356/78/507 1 -518.121 -516.947 - 4.51 - ... 2 -518.057 -516.947 - 14.94 2 ... 3 -517.837 -516.947 - 23.75 2 ... 4 -517.054 -516.947 - 25.07 2 372/86/515 5 -516.947 -516.947 - 31.73 2 nck 100 35 734/167/1035 1 -83.580 -79.060 - 3.72 - ... 2 -82.126 -81.638 - 21.70 2 ... 3 -82.077 -81.638 - 6.45 2 744/172/1040 4 -81.638 -81.638 - 11.19 1 nck 100 80 734/167/1035 1 -174.841 -171.024 - 6.25 - ... 2 -173.586 -172.631 - 24.71 2 742/171/1039 3 -172.632 -172.632 - 12.85 2

Table 6 Results for Nonlinear Continuous Knapsack problem

SC-MINLP COUENNE BONMIN 1 BONMIN 50 var/int/cons time time instance original (LB) UB (LB) UB time UB time UB nck 20 100 40/0/21 15.76 -159.444 3.29 -159.444 0.02 -159.444 1.10 -159.444 nck 20 200 40/0/21 23.76 -239.125 (-352.86) -238.053 0.03 -238.053 0.97 -239.125 nck 20 450 40/0/21 15.52 -391.337 (-474.606) -383.149 0.07 -348.460 0.84 -385.546 nck 50 400 100/0/51 134.25 -516.947 (-1020.73) -497.665 0.08 -438.664 2.49 -512.442 nck 100 35 200/0/101 110.25 -81.638 90.32 -81.638 0.04 -79.060 16.37 -79.060 nck 100 80 200/0/101 109.22 -172.632 (-450.779) -172.632 0.04 -159.462 15.97 -171.024

3.4. GLOBALLib and MINLPLib instances. Our algorithm is suitable for problems with functions having all non-convexities manifested as sums of univariate non-convex functions, and moreover the variables in the univariate non-convex functions should be bounded. We selected 16 instances from GLOBALLib [11] and 9 from MINLPLib [15] to test our algorithm. These instances were selected because they were easily reformulated to have these properties. Note that we are not advocating solution of general global optimization problems via reformulation as problems of the form , followed by application of our algorithm SC-MINLP. We simply produced these reformulation toP enlarge our set of test problems. To reformulate these instances in order to have univariate non-convex functions, we used as a starting point the reformulation performed by COUENNE [2]. It reformulates MINLP problems in order to have the following “basic operators”: sum, product, quotient, power, exp, log, sin, cos, abs. The only non-univariate basic operators are product and quotient (power xy is converted to ey log(x)). In these cases we modified the reformulation of COUENNE in order to have only univariate non-convexities: 2 2 product: xy is transformed into (w1 w2)/4 with w1 = x + y and w2 = x y ; • quotient: x/y is transformed into xw−and w = 1/y and xw is then treated− as any other • product. We report in the first part of the table the GLOBALLib results. The second part of the table reports the MINLPLib results. The instances were selected among the ones reported in the computational results of a recent paper [2]. Some of these particular MINLPLib instances are originally convex, but their reformulation is non-convex. It is clear that the employment of specific solvers for convex MINLPs such as BONMIN to solve this kind of problem is much more efficient and advisable. However, we used the non-convex reformulation of these instances to extend our test bed. In Table 8, the second column reports the number of variables, integer variables and constraints of the reformulated problem. 14 CLAUDIA D’AMBROSIO, JON LEE & ANDREAS WACHTER¨

Table 7 Results for GLOBALLib and MINLPLib

reformulated iter int time # br instance var/int/cons # LB UB change MINLP added ex14 2 1 239/39/358 1 0.000 0.000 - 5.35 - ex14 2 2 110/18/165 1 0.000 0.000 - 3.33 - ex14 2 6 323/53/428 1 0.000 0.000 - 7.02 - ex14 2 7 541/88/808 1 0.000 0.000 - 1.06 - ex2 1 1 27/5/38 1 -18.900 -16.500 - 0.01 - ... 2 -18.318 -16.500 - 0.06 1 ... 3 -18.214 -16.500 - 0.12 1 ... 4 -18.000 -16.500 - 0.15 1 ... 5 -17.625 -17.000 - 0.22 2 39/11/44 6 -17.000 -17.000 - 0.26 1 ex2 1 2 29/5/40 1 -213.000 -213.000 - 0.01 - ex2 1 3 36/4/41 1 -15.000 -15.000 - 0.00 - ex2 1 4 15/1/16 1 -11.000 -11.000 - 0.00 - ex2 1 5 50/7/72 1 -269.453 -268.015 - 0.01 - 54/9/74 2 -268.015 -268.015 - 0.15 2 ex2 1 6 56/10/81 1 -44.400 -29.400 - 0.01 - ... 2 -40.500 -39.000 - 0.16 2 ... 3 -40.158 -39.000 - 0.25 1 66/15/86 4 -39.000 -39.000 - 0.52 2 ex2 1 7 131/20/181 1 -13,698.362 -4,105.278 - 0.07 - 2 -10,643.558 -4,105.278 - 1.34 11 3 -8,219.738 -4,105.278 - 2.68 5 4 -6,750.114 -4,105.278 - 4.66 10 5 -5,450.142 -4,105.278 - 16.42 11 6 -5,014.019 -4,105.278 - 32.40 6 7 -4,740.743 -4,105.278 - 38.61 15 8 -4,339.192 -4,150.410 - 86.47 4 9 -4,309.425 -4,150.410 - 133.76 11 10 -4,250.248 -4,150.410 - 240.50 6 11 -4,156.125 -4,150.410 - 333.08 5 309/109/270 12 -4,150.411 -4,150.410 - 476.47 5 ex9 2 2 68/14/97 1 55.556 100.000 - 0.00 - ... 2 78.679 100.000 - 0.83 10 ... 3 95.063 100.000 - 5.77 19 ... 4 98.742 100.000 - 18.51 8 ... 5 99.684 100.000 - 37.14 11 184/72/155 6 99.960 100.000 - 56.00 10 ex9 2 3 90/18/125 1 -30.000 0.000 - 0.00 - ... 2 -30.000 0.000 - 2.29 12 ... 3 -30.000 0.000 - 6.30 12 ... 4 -30.000 0.000 - 8.48 12 ... 5 -24.318 0.000 - 30.44 22 ... 6 -23.393 0.000 - 16.25 10 ... 7 -21.831 0.000 - 31.92 12 ... 8 -19.282 0.000 - 26.53 12 ... 9 -7.724 0.000 - 203.14 17 332/139/246 10 -0.000 0.000 - 5,426.48 12 ex9 2 6 105/22/149 1 -1.500 3.000 - 0.02 - ... 2 -1.500 1.000 - 9.30 28 ... 3 -1.500 0.544 - 34.29 15 ... 4 -1.500 -1.000 - 48.33 12 ... 5 -1.500 -1.000 - 130.04 22 ... 6 -1.500 -1.000 - 50.29 24 ... 7 -1.500 -1.000 - 53.66 13 ... 8 -1.500 -1.000 - 67.93 22 ... 9 -1.500 -1.000 - 103.13 14 ... 10 -1.366 -1.000 - 127.60 12 ... 11 -1.253 -1.000 - 1,106.78 22 ... 12 -1.116 -1.000 - 3,577.48 13 ... 13 -1.003 -1.000 - 587.11 15 557/248/375 14 -1.003 -1.000 - 1,181.12 14 ex5 2 5 802/120/1,149 1 -34,200.833 -3,500.000 - 0.10 - 1,280/359/1,388 2 -23,563.500 -3,500.000 - 7,192.22 239 ex5 3 3 1,025/181/1,474 1 -67,499.021 - - 0.17 - 1,333/335/1,628 2 -16,872.900 3.056 - 7,187.11 154 du-opt 458/18/546 1 3.556 3.556 - 5.30 - du-opt5 455/15/545 1 8.073 8.073 - 9.38 - fo7 366/42/268 1 8.759 22.518 - 7,200 - m6 278/30/206 1 82.256 82.256 - 182.72 - no7 ar2 1 421/41/325 1 90.583 127.774 - 7,200 - no7 ar3 1 421/41/325 1 81.539 107.869 - 7,200 - no7 ar4 1 421/41/325 1 76.402 104.534 - 7,200 - o7 2 366/42/268 1 79.365 124.324 - 7,200 - stockcycle 626/432/290 1 119,948.675 119,948.676 - 244.67 -

Table 8 shows that COUENNE performs much better than SC-MINLP on the GLOBALLib in- stances that we tested. But on such small instances, COUENNE behaves very well in general, so there cannot be much to recommend any alternative algorithm. Also BONMIN 1 and BONMIN 50 perform well. They find global optima in 10 (resp. 13) out of the 16 GLOBALLib instances rather quickly. However, in 3 (resp. 1) instances they fail to produce a feasible solution. Concerning the MINLPLib instances, 4 out of 9 instances are solved to global optimality by SC-MINLP. COUENNE finishes within the imposed time limit on precisely the same 4 instances (in one case, COUENNE with default settings incorrectly returned a solution with a worse objective function). In the 5 instances not solved (by both solvers) within the imposed time limit, the lower bound given by SC-MINLP is always better (higher) than the one provided by COUENNE. This AN ALGORITHMIC FRAMEWORK FOR MINLP WITH SEPARABLE NON-CONVEXITY 15

Table 8 Results for GLOBALLib and MINLPLib

SC-MINLP COUENNE BONMIN 1 BONMIN 50 reformulated var/int/cons time time instance original (LB) UB (LB) UB time UB time UB ex14 2 1 122/0/124 21.81 0.000 0.18 0.000 0.13 0.000 46.25 0.000 ex14 2 2 56/0/57 12.86 0.000 0.10 0.000 0.05 0.000 28.02 0.000 ex14 2 6 164/0/166 42.56 0.000 1.03 0.000 1.25 0.000 138.02 0.000 ex14 2 7 277/0/280 128.70 0.000 0.63 0.000 0.28 0.000 170.62 0.000 ex2 1 1 12/0/8 2.84 -17.000 0.16 -17.000 0.04 -11.925 0.89 -17.000 ex2 1 2 14/0/10 1.75 -213.000 0.06 -213.000 0.02 -213.000 0.66 -213.000 ex2 1 3 24/0/17 1.62 -15.000 0.04 -15.000 0.02 -15.000 0.36 -15.000 ex2 1 4 12/0/10 1.28 -11.000 0.04 -11.000 0.03 -11.000 0.58 -11.000 ex2 1 5 29/0/30 2.21 -268.015 0.13 -268.015 0.03 -268.015 0.96 -268.015 ex2 1 6 26/0/21 3.31 -39.000 0.12 -39.000 0.04 -39.000 1.11 -39.000 ex2 1 7 71/0/61 1388.00 4,150.410 3.98 4,150.410 0.04 4,150.410 13.64 4,150.410 ex9 2 2 38/0/37 1,355.47 100.000 0.36 100.000 0.12 - 2.65 100.000 ex9 2 3 54/0/53 5,755.76 0.000 0.73 0.000 0.08 15.000 2.85 5.000 ex9 2 6 57/0/53 (-1.003) -1.000 0.78 -1.000 0.21 - 4.04 -1.000 ex5 2 5 442/0/429 (-23,563.500) -3,500.000 (-11,975.600) -3,500.000 2.06 -3,500.000 315.53 -3,500.000 ex5 3 3 563/0/550 (-16,872.900) 3.056 (-16,895.400) 3.056 20.30 - 22.92 - du-opt 242/18/230 13.52 3.556 38.04 3.556 41.89 3.556 289.88 3.556 du-opt5 239/15/227 17.56 8.073 37.96 8.073 72.32 8.118 350.06 8.115 fo7 338/42/437 (8.759) 22.518 (1.95) 22.833 - 22.518 - 24.380 m6 254/30/327 185.13 82.256 54.13 82.256 154.42 82.256 211.49 82.256 no7 ar2 1 394/41/551 (90.583) 127.774 (73.78) 111.141 - 122.313 - 107.871 no7 ar3 1 394/41/551 (81.539) 107.869 (42.19) 113.810 - 121.955 - 119.092 no7 ar4 1 394/41/551 (76.402) 104.534 (54.85) 98.518 - 117.992 - 124.217 o7 2 338/42/437 (79.365) 124.324 (5.85) 133.988 - 126.674 - 130.241 stockcycle 578/480/195 251.54 119,948.676 84.35 119, 948.676∗ 321.89 119,948.676 328.03 119,948.676

* This time and correct solution were obtained with non-default options of Couenne (which failed with default settings). result emphasizes the quality of the lower bound computed by the solution of the convex MINLP relaxation . Moreover, the upper bound computed by SC-MINLP is better in 2 instances out of 5. ConcerningQ BONMIN 1 and BONMIN 50 performance, when they terminate before the time limit is reached (4 instances out of 9), the solution computed is globally optimal for 3 instances. Note that in these cases, the CPU time needed by BONMIN 1 and BONMIN 50 is often greater than that needed by SC-MINLP. When the time limit is reached, BONMIN 1 computes a better solution than SC-MINLP in only 1 instance out of 5 (for 1 instance they provide the same solution) and BONMIN 50 computes a better solution than SC-MINLP in 1 only instance out of 5. Finally, we note that for several instances of applying SC-MINLP, in just the second iteration, the convex MINLP solver BONMIN was in the midst of working when our self-imposed time limit of 2 hours was reached. In many of these cases, much better lower bounds could be achieved by increasing the time limit. Generally, substantial improvements in BONMIN would produce a corresponding improvement in the results obtained with SC-MINLP.

4. Summary. In this paper, we proposed an algorithm for solving to global optimality a broad class of separable MINLPs. Our simple algorithm, implemented within the AMPL modeling language, works with a lower-bounding convex MINLP relaxation and an upper-bounding non- convex NLP restriction. For the definition of the lower-bounding problem, we identify subintervals of convexity and concavity for the univariate functions using external calls to MATLAB; then we develop a convex MINLP relaxation of the problem approximating the concave intervals of each non-convex function with the linear relaxation. The subintervals are glued together using binary variables. We iteratively refine our convex MINLP relaxation by modifying it at the modeling level. The upper-bound is obtained by fixing the integer variables in the original non-convex MINLP, then locally solving the associated non-convex NLP restriction. We presented preliminary computational experiments on models from a variety of application areas, including problems from GLOBALLib and MINLPLib. We compared our algorithm with the open-source solvers COUENNE as an exact approach, and BONMIN as a heuristic approach, obtaining significant success.

Acknowledgment. This work was partially developed when the first author was visiting the IBM T.J. Watson Research Center, and their support is gratefully acknowledged. We also thank Pietro Belotti for discussions about the modification of COUENNE to reformulate GLOBALLib and MINLPLib instances. 16 CLAUDIA D’AMBROSIO, JON LEE & ANDREAS WACHTER¨

REFERENCES

[1] E. Beale and J. Tomlin, Special facilities in a general mathematical programming system for non- convex problems using ordered sets of variables, in Proc. of the 5th Int. Conf. on Operations Research, J. Lawrence, ed., 1970, pp. 447–454. [2] P. Belotti, J. Lee, L. Liberti, F. Margot, and A. Wachter¨ , Branching and bounds tightening techniques for non-convex MINLP, 2008. IBM Research Report RC24620; to appear in: Optimization Methods and Software. [3] P. Bonami, L. Biegler, A. Conn, G. Cornuejols,´ I. Grossmann, C. Laird, J. Lee, A. Lodi, F. Margot, N. Sawaya, and A. Wachter¨ , An algorithmic framework for convex mixed integer nonlinear programs, Discrete Optimization, 5 (2008), pp. 186–204. [4] BONMIN. projects.coin-or.org/Bonmin, v. 1.0. [5] A. Borghetti, C. D’Ambrosio, A. Lodi, and S. Martello, An MILP approach for short-term hydro scheduling and unit commitment with head-dependent reservoir, IEEE Transactions on Power Systems, 23 (2008), pp. 1115–1124. [6] COUENNE. projects.coin-or.org/Couenne, v. 0.1. [7] C. D’Ambrosio, J. Lee, and A. Wachter¨ , A global-optimization algorithm for mixed-integer nonlinear programs having separable non-convexity, in Proc. of 17th Annual European Symposium on Algorithms (ESA), Copenhagen, Denmark. Lecture Notes in Computer Science, A. Fiat, ed., 2009 (to appear). [8] M. Duran and I. Grossmann, An outer-approximation algorithm for a class of mixed-integer nonlinear programs, Mathematical Programming, 36 (1986), pp. 307–339. [9] R. Fletcher and S. Leyffer, Solving mixed integer nonlinear programs by outer approximation, Mathe- matical Programming, 66 (1994), pp. 327–349. [10] R. Fourer, D. Gay, and B. Kernighan, AMPL: A Modeling Language for Mathematical Programming, Duxbury Press/Brooks/Cole Publishing Co., second ed., 2003. [11] GLOBALLib. www.gamsworld.org/global/globallib.htm. [12] O. Gunl¨ uk,¨ J. Lee, and R. Weismantel, MINLP strengthening for separable convex quadratic transportation-cost UFL, 2007. IBM Research Report RC24213. [13] L. Liberti, Writing global optimization software, in Global Optimization: From Theory to Implementation, L. Liberti and N. Maculan, eds., Springer, Berlin, 2006, pp. 211–262. [14] MATLAB. www.mathworks.com/products/matlab/, R2007a. [15] MINLPLib. www.gamsworld.org/minlp/minlplib.htm. [16] I. Nowak, H. Alperin, and S. Vigerske, LaGO – an object oriented library for solving MINLPs, in Global Optimization and Constraint Satisfaction, vol. 2861 of Lecture Notes in Computer Science, Springer, Berlin Heidelberg, 2003, pp. 32–42. [17] I. Quesada and I. Grossmann, An LP/NLP based branch and bound algorithm for convex MINLP optimiza- tion problems, Computer & Chemical Engineering, 16 (1992), pp. 937–947. [18] N. Sahinidis, BARON: A general purpose global optimization software package, J. Global Opt., 8 (1996), pp. 201–205. [19] A. Wachter¨ and L. T. Biegler, On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear programming, Mathematical Programming, 106 (2006), pp. 25–57. J Glob Optim DOI 10.1007/s10898-016-0443-3

Monomial-wise optimal separable underestimators for mixed-integer polynomial optimization

Christoph Buchheim1 · Claudia D’Ambrosio2

Received: 21 January 2016 / Accepted: 12 May 2016 © Springer Science+Business Media New York 2016

Abstract We introduce a new method for solving box-constrained mixed-integer polynomial problems to global optimality. The approach, a specialized branch-and-bound algorithm, is based on the computation of lower bounds provided by the minimization of separable under- estimators of the polynomial objective function. The underestimators are the novelty of the approach because the standard approaches in global optimization are based on convex relaxations. Thanks to the fact that only simple bound constraints are present, minimizing the separable underestimator can be done very quickly. The underestimators are computed monomial-wise after the original polynomial has been shifted. We show that such valid underestimators exist and their degree can be bounded when the degree of the polynomial objective function is bounded, too. For the quartic case, all optimal monomial underestima- tors are determined analytically. We implemented and tested the branch-and-bound algorithm where these underestimators are hardcoded. The comparison against standard global opti- mization and polynomial optimization solvers clearly shows that our method outperforms the others, the only exception being the binary case where, though, it is still competitive. Moreover, our branch-and-bound approach suffers less in case of dense polynomial objec- tive function, i.e., in case of polynomials having a large number of monomials. This paper is an extended and revised version of the preliminary paper [4].

Keywords Polynomial optimization · Mixed-integer nonlinear programming · Separable underestimators

1 Introduction

In the last decade, the focus of the mathematical optimization community has increasingly moved towards mixed-integer non-linear programming (MINLP). The simultaneous pres-

B Christoph Buchheim [email protected] 1 Fakultät für Mathematik, TU Dortmund, Vogelpothsweg 87, 44227 Dortmund, Germany 2 LIX CNRS (UMR7161), École Polytechnique, 91128 Palaiseau Cedex, France 123 J Glob Optim ence of integrality constraints and non-linearities makes these problems harder than mixed integer linear programming (MILP) problems, especially when non-convexities come into play. In particular, MINLP problems are typically very challenging both from a theoreti- cal and a practical viewpoint. As an example, minimizing a quadratic function subject to box-constraints is an NP-hard problem, both in the case of continuous and integer variables. Different types of convex relaxations and reformulations have been proposed in the literature for such problems. The resulting convex problems may be linear [3], semidefinite [7,24]or other types of tractable problems [8,9]. In fact, the standard way of dealing with general non-convex MINLPs is to relax them so as to obtain convex (often linear) relaxations. The relaxation results in a larger convex feasible set and a convex underestimator of the objective function. We refer to [2,10,12,13] for an overview of standard methods for MINLPs. Recently, Buchheim and Traversi [6] proposed to relax quadratic combinatorial optimiza- tion problems by replacing the objective function by separable non-convex underestimators instead of convex ones. In this paper, we propose a similar approach for a different class of problems, namely the minimization of polynomials subject to box and integrality constraints. More formally, we consider problems of the form min f (x) s.t. x ∈[l, u] (MIPO) xi ∈ Z ∀i ∈ I,  , ∈ Rn ≤ ⊆{ ,..., } [ , ]:= n [ , ] where l u with l u and I 1 n . We denote l u i=1 li ui and assume , ∈ Z ∈ without loss of generality that li ui for i I . The objective function f is an arbitrary α polynomial of total degree d ∈ N, it can be written as f = α∈ cα x ,whereA is a finite  α A Nn α := n i subset of 0 and x i=1 xi . Few specific methods exist for such kind of problem. In the case of binary variables, the standard approach is to add new variables replacing every monomial appearing in the problem and then linearize them. Another approach is to transform the polynomial problem into a quadratic one [5,20]. When we have general integer variables, we can apply binary expansion to reduce the problem to the previous case. Clearly, the drawback is that a large number of variables is introduced and the approach usually is not practical. In the mixed-integer case, the most widely used algorithms use a combination of con- vex underestimators and bound tightening techniques. Recently, Lasserre and Thanh [17] proposed to underestimate an arbitrary polynomial by a sequence of convex polynomials. The algorithms implemented in the commonly used MINLP solvers (see, for exam- ple, [1,18,21,23]) derive convex underestimators from the building blocks of the objective function. In this paper, we propose a similar idea, i.e., we compute underestimators monomial- wise. However, our underestimators are non-convex but separable. Even if we expect convex underestimators to be tighter than separables ones in general, a separable understimator can be minimized very quickly without needing to relax the integrality constraints. Thus, separable underestimators represent a good trade-off between approximation quality and tractability. These nice properties are confirmed by experimental results. Our approach per- forms very favorably against standard software for mixed-integer non-linear optimization such as ANTIGONE [18], BARON [23], Couenne [1], and SCIP [21]. We also compared our method against GloptiPoly [15], one of the best performing solvers for continuous polyno- mial optimization. It is based on hierarchies of semidefinite programming relaxations (see, e.g., [16,19,22]) and does not explicitly take into account integrality requirements. To cir- cumvent this limitation, it is possible to add polynomial equations encoding integrality, but the resulting problems are intractable even for medium size instances. In the continuous 123 J Glob Optim instances, GloptiPoly performs better than in the mixed or integer ones but, in any case, the semidefinite programs used to underestimate the polynomial optimization problem take a long time to be solved. In Sect. 2 the basic ideas of our approach are introduced. We then discuss the general branch-and-bound framework and its features (Sect. 3). In Sect. 4 we show extensive com- putational results that confirm the effectiveness of our approach. Finally, we conclude the paper and present some future directions in Sect. 5.

2 Separable underestimators for polynomials

In order to obtain a lower bound for Problem (MIPO), we aim at underestimating its objective : Rn → R := ( ) = n ( ) function f of degree d deg f by a separable polynomial g x i=1 gi xi, : R → R ¯ := 1 where all functions gi are univariate polynomials of degree at most d 2 2 d . This is the smallest degree that ensures the existence of a global separable underestimator of f ; see Sect. 2.2. Such underestimators are easy to minimize, as discussed in the next section.

2.1 Minimizing a separable polynomial  ( ) = n ( ) If g x i=1 gi xi is a separable polynomial, we clearly have ⎧ ⎪ ( ) min g(x) n ⎨ min gi x s.t. x ∈[l, u] = s.t. x ∈[l , u ] ⎩⎪ i i i x ∈ Z ∀i ∈ I i=1 i xi ∈ Z if i ∈ I. This means that we can minimize the underestimator g by solving n single univariate min- imization problems involving the functions gi separately. The problem is discrete if i ∈ I , continuous otherwise. It is important to stress that we never relax the integrality requirements in our approach, thus we never change the feasible region. In particular, the vector containing the solutions of all univariate problems is a feasible solution of (MIPO). An upper bound can thus be obtained by re-evaluating this solution with respect to the original polynomial function f . To guarantee feasibility, the two types of problems (integer or continuous) are solved in a slightly different way. Let us start with the solution of the continuous univariate problem ∈/ [ , ] ( ) = (i I ). The minimizer of gi over li ui must either satisfy the first order condition gi x 0  ¯ − or be on the boundary. Recall that gi is a polynomial of degree at most d 1. Thus, it has at most d¯ − 1 real zeroes and they can be computed by closed formulae if d¯ ≤ 5, numerically ¯ otherwise. This means that we have at most d + 1 candidates for the minimizer of gi and that we can identify the minimizer by enumeration. When i ∈ I , i.e., the univariate problem has to satisfy integrality requirements, the integer minimizer must be a continuous minimizer of gi rounded up or down. Thus, we start by identifying the continuous candidates as explained above, round them up and down, and end up with at most 2d¯ candidates. See Fig. 1 for an illustration. ∗ ∗ In all cases, we thus obtain a minimizer xi for gi , such that x is a mixed-integer min- imizer of g. We thus obtain g(x∗) as lower bound and f (x∗) as upper bound on (MIPO). Obviously, the objective is to achieve a small distance f (x∗) − g(x∗), i.e., to compute a tight underestimator g. In the next section, we first argue that separable global underestimators exist. 123 J Glob Optim

3500

3000

2500

2000

1500

1000

500

0

-500

-1000 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6

Fig. 1 A univariate polynomial of degree five, its stationary points (circles), and the candidates for being integer optimizers (squares)

2.2 Global underestimators for arbitrary degree

Since the feasible region [l, u] of Problem (MIPO) is compact, it is clear that separable—even constant—underestimators for any polynomial f over [l, u] exist. We now show that each polynomial f of degree d even admits a global separable underestimator of degree d¯.For this, it suffices to underestimate each monomial of f separately. Theorem 1 Let f = cxα be a monomial of even degree d. Then

|c| n f (x) ≥− α xd d i i i=1 for all x ∈ Rn.

Proof As d is even, we obtain

n n 1/d n α α α 1 |x |= |x | i = (xd ) i ≤ α xd i i d i i i=1 i=1 i=1 by the inequality of arithmetic and geometric means. Hence

|c| n f (x) = cxα ≥−|c||xα|≥− α xd d i i i=1 for all x ∈ Rn.

Theorem 2 Let f = cxα be a monomial of odd degree d. Then | | n   c d+1 d−1 f (x) ≥− αi x + x 2d i i i=1 for all x ∈ Rn. 123 J Glob Optim

Proof As d − 1andd + 1 are even, we obtain

n n 1/(2d) n   1  α αi d−1 d+1 αi d−1 d+1 |x |= |xi | = (x x ) ≤ αi x + x i i 2d i i i=1 i=1 i=1 by the inequality of arithmetic and geometric means. Hence | | n   α α c d−1 d+1 f (x) = cx ≥−|c||x |≥− αi x + x 2d i i i=1 for all x ∈ Rn.

2.3 Computing tight separable underestimators

Now that we know that a global separable underestimator exists and how to minimize it, we discuss how to obtain a best possible one, i.e., one that yields the tightest lower bound. Ideally, we would like to solve the following problem: max min g(x) x∈[l,u] xi ∈Z for i∈I s.t. g(x) ≤ f (x) ∀x ∈[l, u] g separable polynomial of degree at most d¯. It is evident that solving the problem above is too difficult as even testing whether g = 0isa valid underestimator for f would imply to decide non-negativity of f , which is well-know to be an intractable problem for general f . We thus simplify the problem by considering each monomial separately, i.e., by comput- ing the optimal separable underestimator for each monomial, and obtain a global separable underestimator for the polynomial by summing up each of them. The resulting underestima- tor will not be optimal for f , but we hope that the provided lower bounds are still reasonably tight (as confirmed by the experimental results presented in Sect. 4). When summing up monomial-wise underestimators, it is not useful any more to consider the above objective function min g(x), x∈[l,u] xi ∈Z for i∈I as each monomial underestimator may have a different minimizer. With this in mind, we now aim at finding an underestimator that is as close as possible to f on average on the feasible region [l, u]. Hence we replace the objective function above by  g(x) dx. [l,u] In summary, we obtain the problem  ( ) max [l,u] g x dx s.t. g(x) ≤ f (x) ∀x ∈[l, u] (P) g separable polynomial of degree at most d¯, where the variables of the problem are the coefficients of the univariate polynomials gi .We can define a variable c0 collecting all the constant terms of all gi and rewrite the separable underestimator as 123 J Glob Optim

n d¯ ( ) = j + . g x cijxi c0 i=1 j=1 ¯ Problem (P) has nd + 1 variables, i.e., the coefficients cij and c0.Wethenhavealinear objective function, as

 n d¯   ( ) = j + . g x dx cij xi dx c0 1dx [ , ] [ , ] [ , ] l u i=1 j=1 l u l u As f is fixed, Problem (P) can thus be modeled as a linear optimization problem over the cone of non-negative polynomials. The dual problem can be calculated as  μ min  f d  j μ = j = ,..., , = ,..., ¯ s.t.  xi d  [l,u] xi dx for i 1 n j 1 d μ = (D) 1d [l,u] 1dx μ Borel-measure with support on [l, u]. The global underestimators introduced in Sect. 2.2 do not constitute optimal solutions of Problem (P) in general, even if f is a monomial. In order to determine optimal monomial underestimators, the first step is to translate the problem such that the origin becomes the center of the feasible region, i.e., we obtain a symmetric box. We can calculate the shifted ¯ ¯( ) := ( + ) := 1 ( + ) = ,..., polynomial f by f x f x t ,whereti 2 li ui for all i 1 n. The resulting shifted polynomial has the same degree as f but a different number of monomials. These monomials are considered separately and a Problem (P) is solved for each of them over the new box [l − t, u − t]. The separable underestimator of f¯,sayg¯, is obtained by summing up all monomial understimators. We finally get the separable underestimator for f by shifting back g,usingg(x) :=g ¯(x − t). By shifting we thus get a box [l − t, u − t] where l − t =−(u − t) by definition of t. Consequently, we can assume that the feasible set in Problem (P) is symmetric with respect to the origin, i.e., that l =−u.As f is a monomial in our approach, we are allowed to scale the feasible box to normalize further: Lemma 1 Let u > 0 and let n d¯ j + cijxi c0 i=1 j=1 be an optimal solution to (P) over [−1, 1]n for a monomial f = cxα with c = 0.Then ⎛ ⎞ ¯ n d c | 1 ( )|·⎝ ij j + ⎠ c f u j xi c0 i=1 j=1 ui is an optimal solution to (P) over [−u, u]. The proof is straightforward. Note that we can assume u > 0 without loss of generality: if ui = 0forsomei, the corresponding variable xi can be eliminated. Thus the above lemma is always applicable. It is a crucial observation for our approach that Problem (P), when applied to single monomials, only involves at most d¯ different variables. This means that when we fix a bound onthedegreeof f (and thus of g) we have to solve only a fixed number of different problems 123 J Glob Optim

¯ of type (P), namely those for f (x) := xα and f (x) := −xα over [−1, 1]d for each possible monomial xα with |α|≤d¯. This implies that we do not need to compute the best underestimators of each monomial at each node of the branch-and-bound tree but it suffices to solve Problem (P) offline once for each relevant monomial and to hardcode the results. Then online we just need to replace each monomial in f¯ by its previously computed optimal underestimator. The steps to perform to get the underestimator of f are just shift, scale, replace, sum up the single underestimators, and scale and shift back. If the total degree of f is bounded, the computation of the underestimator can thus be performed in time linear in the number of monomials. Note that the number of monomials after shifting is linear in the original number of monomials when the degree d is fixed, but the factor grows exponentially in d. Summarizing the discussion so far, for a given maximum degree d, it thus remains to solve Problem (P) for each monomial f of degree at most d¯ over [−1, 1]. An important tool for this is the following result:

Lemma 2 (Complementary Slackness) Let g be feasible for (P) and let μ be feasible for (D). If ( f − g) dμ = 0, then g is optimal for (P). We will use this observation in the next section in order to determine all optimal monomial underestimators for degree d ≤ 4.

2.4 Optimal underestimators for quartic monomials

In the following, we focus on monomials of degree at most four and determine their optimal underestimators according to (P). For the proofs we use Lemma 2, where a corresponding feasible solution for the dual problem (D) is constructed explicitly. Up to symmetry, 12 monomials have to be considered but, because the cases of positive and negative coefficients have to be distinguished, we end up with 24 cases, namely the ± k = ,..., ± ± 2 ± ± 3 ± 2 2 ± 2 monomials x1 (for k 0 4), x1x2, x1 x2, x1x2x3, x1 x2, x1 x2 , x1 x2x3,and ±x1x2x3x4. The first ten are already separable and hence trivial. The following result covers a further nine cases:  α ∈ Nn := n α ≤ Theorem 3 Let 0 such that d i=1 i is even and d 4. Assume that at least one α n αi is odd, or that c ≤ 0. Then an optimal solution of (P) for f (x) := cx over [−1, 1] is |c| n g(x) := − α xd . d i i i=1 Proof First note that validity of g follows from Theorem 1. To prove optimality, first assume that there exists some r with αr odd. Define a Borel measure μ implicitly by requiring   1 − h dμ = 2n 1 h(te˜) dt, −1 where e˜r =−sgn c and e˜i = 1fori = r.Then   1dμ = 2n = 1dx. [−1,1] Moreover, if j is even,    1 j μ = n−1 ( ˜ ) j = 1 n = j , xi d 2 tei dt 2 xi dx −1 j + 1 [−1,1] 123 J Glob Optim while for j odd    1 j μ = n−1 ( ˜ ) j = = j . xi d 2 tei dt 0 xi dx −1 [−1,1] Hence μ is feasible for the dual problem (D). Finally, we have   1    α |c| ( − ) μ = n−1 d ˜ k + α d ˜d f g d 2 ct ek kt ek dt −1 d  k k 1    n−1 d d 1 = 2 −|c|t +|c|t αk dt − d 1 k = 0,  α = α as d is even and k k d. Hence the underestimator g is optimal by Lemma 2.Ifall i are even but c < 0, the proof is analogous. From the above proof, it follows that the computed underestimators remain optimal for any d¯ ≥ d and that these optimal underestimators are independent of the domain [−u, u], as f − g turns out to be globally non-negative. Next, we consider the only remaining even degree case for d¯ ≤ 4. ≥ ( ) := 2 2 [− , ]n Theorem 4 Let c 0. Then an optimal solution of (P) for f x cx1 x2 over 1 1 for d¯ = 4 is 1 2 1 2 2 g(x) =− cx4 + cx2 − cx4 + cx2 − c. 2 1 3 1 2 2 3 2 9 Proof By scaling, we may assume c = 1. The function g is a (global) underestimator of f as   1 2 2 f (x) − g(x) = x2 + x2 − ≥ 0 2 1 2 3 for all x ∈ R2. Consider the discrete Borel measure   μ :=1 δ √ + δ √ + δ √ + δ √ 5 √1/3 √1/3 −√ 1/3 −√1/3 1/3 − 1/3 1/3 − 1/3   + 4 δ √ + δ √ + δ + δ . 5 2/3 − 2/3 √0 √0 0 0 2/3 − 2/3 Then  4 16 1dμ = + = 4  5 5 j μ = ∀ xi d 0 j odd      1 4 4 4 4 x2 dμ = + = i 5 3 5 3 3      1 4 4 8 4 x4 dμ = + = , i 5 9 5 9 5  this implies that μ is feasible for (D). Since ( f − g) dμ = 0, the result follows from Lemma 2. 123 J Glob Optim

1 1 0.5 0.5 0 0 -0.5 -0.5 -1 -1 1 1 0.5 0.5 -1 0 -1 0 -0.5 -0.5 0 -0.5 0 -0.5 0.5 0.5 1-1 1-1 ¯ ≤ ( ) = 2 2 ( ) = 2 Fig. 2 Optimal separable underestimators of degree d 4for f x x1 x2 (left)and f x x1 x2 (right) See Fig. 2 for an illustration. In this case, the proof again shows optimality independently of the domain, but not of the degree d¯. In fact, when allowing degree d¯ = 6, a better underestimator can be computed. The following results complete the quartic case. n Theorem 5 An optimal solution of (P) for f (x) := cx1x2x3 over [−1, 1] is   |c| 5 3 g(x) =− (x4 + x4 + x4) + (x2 + x2 + x2) . 6 3 1 2 3 5 1 2 3

Proof By symmetry and scaling, we may assume c = 1. The function g is a (global) under- estimator of f ,as   1 5 3 f (x) − g(x) = x x x + (x4 + x4 + x4) + (x2 + x2 + x2) 1 2 3 6 3 1 2 3 5 1 2 3  √ √ √ 1 1 2 1 2 = 15 (x1x2 + 15x3) + (x1x3 + 15x2) 18 5  5 1√ 1 √ + (x x + 15x )2 + 15(x2 + x2 − 2x2)2 2 3 5 1 72 1 2 3 1 √ + 15(x2 − x2)2. 24 1 2 Consider the discrete Borel measure   32 ⎛ ⎞ 10 ⎛ √ ⎞ ⎛ √ ⎞ ⎛ √ ⎞ ⎛ √ ⎞ μ := δ + δ − / + δ / + δ / + δ − / . 9 ⎜ 0 ⎟ 9 ⎜ √ 3 5 ⎟ ⎜ √3 5 ⎟ ⎜ √3 5 ⎟ ⎜ √3 5 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎝ 0 ⎠ ⎝ √3/5 ⎠ ⎝ −√ 3/5 ⎠ ⎝ √3/5 ⎠ ⎝ −√3/5 ⎠ 0 3/5 3/5 − 3/5 − 3/5 Then  32 10 1dμ = + 4 · = 8  9 9 j μ = ∀ xi d 0 j odd  2 μ = 10 · · 3 = 8 xi d 4  9 5 3 10 9 8 x4 dμ = · 4 · = , i 9 25 5 123 J Glob Optim  this implies that μ is feasible for (D). Since ( f − g) dμ = 0, the result follows from Lemma 2.

( ) := 2 [− , ]n Theorem 6 An optimal solution of (P) for f x cx1 x2 over 1 1 has the form ( ) =| | 4 +| | 2 +| | 4 + 3 +| | 2 + +| | g x c c14x1 c c12x1 c c24x2 cc23x2 c c22x2 cc21x2 c c0 with

c14 ≈−0.820950196623141 c24 ≈+0.052169786129554 c12 ≈+0.472574632153429 c23 ≈+0.057375067121109 c22 ≈−0.311590671084123 c0 ≈−0.070759806839502 c21 ≈+0.272799782607049.

= ( ) = 2 Proof By symmetry and scaling, we may assume c 1 and hence f x x1 x2. We only sketch the proof, details can be checked with the help of computer algebra packages. Let δ be the third-smallest real zero of 30625z10 − 98750z9 − 366375z8 − 290700z7 + 43050z6 + 145280z5 + 50334z4 − 4148z3 − 4259z2 − 322z + 49, γ = 1 ϕ moreover define 5δ .Let be the smaller zero of the quadratic polynomial in z (250γ 4 + 400γ 3 + 300γ 2 + 280γ + 50)z2 + (−200γ − 40)z − 35γ 2 + 34γ + 7. ( γ + )(γ − + ϕ) Finally, set α := 2 5 1 1 2 and β := ϕ(5γ 2 + 1)(γ + 1).Numerically,wehave 5γ 2−2γ −1 δ ≈ 0.63445,γ≈ 0.31523,α≈ 0.89688,β≈ 0.47981. Now define 1 γ − 1 c = · 14 2 α − β β − αγ c = 12 α − β 1 α2 1 β2 c =− · + · 24 4 (δ + 1)2(α − β) 2 (γ + 1)(γ + δ)2(α − β) 1 α2(γ − 1 − 2δ) β2δ c =− · − 23 4 (δ + 1)2(α − β) (γ + 1)(γ + δ)2(α − β) 1 α2(2γδ+ γ − 2δ − δ2) 1 β2(δ − 1)(δ + 1) c = · + · 22 4 (δ + 1)2(α − β) 2 (γ + 1)(γ + δ)2(α − β) 1 α2δ(2γ + δγ − δ) β2δ c =− · + 21 4 (δ + 1)2(α − β) (γ + 1)(γ + δ)2(α − β) 1 α2γδ2 1 β2δ2 c = · − · . 0 4 (δ + 1)2(α − β) 2 (γ + 1)(γ + δ)2(α − β)

It can be shown that, with these definitions, ( f −g)(x) = σ1 +σ2(1−x2),whereσ1 is a sum- of-squares of degree 8 and σ2 is a sum-of-squares of degree 6. Hence f − g is non-negative on [−1, 1]n and thus g is feasible for (P). We next construct a dual solution as follows: the measure μ is defined as

μ = μ δ + μ δ + μ (δ + + δ − ) + μ (δ + + δ − ) 1 p1 2 p2 3 4 p3 p3 p4 p4 123 J Glob Optim

100

50

0

-50

-100

4 2 -4 0 -2 0 -2 2 -4 4 ( ) = 2 ¯ ≤ Fig. 3 The optimal separable underestimator for f x x1 x2 of degree d 4 is not globally feasible with √  = ( , ), = ( ,δ), ± = (± α,− ), ±(± β,−γ) p1 0 1 p2 0 p3 1 p4 and 1000 γ 4 μ = ≈ 1.48151 1 3 (5γ − 1)(5γ 2 + 1)(5γ + 1) 2 5γ 2 − 2γ − 1 μ = ≈ 0.42841 2 3 (5γ + 1)(γ − 1) 8 1 μ =− ≈ 1.97808 3 3 (γ − 1)(γ + 1)(5γ 2 + 1) 2 5γ 2 + 2γ − 1 μ = ≈ 0.11201. 4 3 (γ + 1)(5γ − 1) n This measure is dual feasible, as all points pi belong to [−1, 1] and all dual constraints are satisfied. It thus suffices to show that f agrees with g on all points pi , the result then follows from Lemma 2 again.

Unlike in all other quartic cases, the optimal local underestimator in Theorem 6 is not valid globally, but requires x2 ∈[−1, 1];seeFig.3.InTable1, we list all optimal underestimators of degree at most four for quartic monomials. Note that the global underestimators devised in Sect. 2.2 are optimal in the case d = 4 if at least one αi is odd or c ≤ 0, as shown in Theorem 3. In fact, the proof of Theorem 3 clearly carries over to arbitrary even degrees with at least one αi odd or c ≤ 0. On the other hand, by Theorems 5 and 6, the underestimator of Sect. 2.2 is not optimal in the case d = 3.

2.5 Optimal underestimators for quartic monomials over integer variables

When computing optimal underestimators according to (P), we do not take into account any given integrality constraints on the variables; these are considered only when minimizing the 123 J Glob Optim

Table 1 Optimal separable underestimators for monomials of degree d ≤ 4ford¯ ≤ 4 +−

± 11 −1

± x1 x1 −x1 ± − 1 ( 2 + 2) − 1 ( 2 + 2) x1x2 2 x1 x2 2 x1 x2 ± 2 2 − 2 x1 x1 x1   ± x x x − 1 5 (x4 + x4 + x4) − 1 5 (x4 + x4 + x4) 1 2 3 6 3 1 2 3  6 3 1 2 3  + 3 ( 2 + 2 + 2) + 3 ( 2 + 2 + 2) 5 x1 x2 x3 5 x1 x2 x3 ± 2 ≈− . 4 + . 2 + . 4 ≈− . 4 + . 2 + . 4 x1 x2 0 82x1 0 47x1 0 052x2 0 82x1 0 47x1 0 052x2 + . 3 − . 2 + . − . − . 3 − . 2 − . − . 0 057x2 0 31x2 0 27x2 0 071 0 057x2 0 31x2 0 27x2 0 071 ± 3 3 − 3 x1 x1 x1 ± − 1 ( 4 + 4 + 4 + 4) − 1 ( 4 + 4 + 4 + 4) x1x2x3x4 4 x1 x2 x3 x4 4 x1 x2 x3 x4 ± 2 − 1 ( 4 + 4 + 4) − 1 ( 4 + 4 + 4) x1 x2x3 4 2x1 x2 x3 4 2x1 x2 x3 ± 2 2 − 1 4 + 2 2 − 1 4 + 2 2 − 2 − 1 ( 4 + 4) x1 x2 2 x1 3 x1 2 x2 3 x2 9 2 x1 x2 ± 3 − 1 ( 4 + 4) − 1 ( 4 + 4) x1 x2 4 3x1 x2 4 3x1 x2 ± 4 4 − 4 x1 x1 x1 resulting underestimator. In theory, the underestimators could be improved by replacing (P) with the linear program  1 ( ) max |F| x∈F g x s.t. g(x) ≤ f (x) ∀x ∈ F (P’) g separable polynomial of degree at most d¯, where F is the finite set of integer feasible solutions. This replacement affects both the constraints, which are weaker than in (P), and the objective function, which now only takes the integer solutions into account when measuring the quality of an underestimator. As F is exponential in general, even if the bounds on the integer variables are fixed, this problem cannot necessarily be solved in polynomial time. Nevertheless, we could stick to our approach proposed above and consider monomials independently, computing optimal underestimators offline. However, as a consequence of branching, we also need to consider all non-trivial contiguous subdomains for all appearing variables. By this, the number of underestimators that have to be computed and hardcoded grows quickly with the domain sizes of integer variables. At the same time, for large domains it is likely that the optimal solution of (P’) does not differ significantly from the optimal solution of (P). In the following, we thus restrict ourselves to the binary and ternary case. After translation and scaling, the two domains in question are thus {−1, 1} and {−1, 0, 1}. In the respective cases, we may exploit that 2 = 3 = xi 1ifxi is binary, xi xi if xi is ternary. (*) In particular, the optimal underestimators for binary or ternary problems can be assumed to be linear or quadratic, respectively. Again considering the quartic case, we thus need to take into account all 24 monomials as before, but for each of those we need to consider each 123 J Glob Optim combination of binary and ternary variables, which means up to 16 combinations. However, using symmetry and (*), only 44 cases have to be considered, as listed in Table 2.Note that Problem (P’) has many optimal solutions in general; in the table we prefer symmetric solutions over sparse ones. We also state by how much the given optimal solution of (P’) improves the optimal solution of (P) with respect to the objective function of (P’), both in absolute terms and in terms of the gap closed with respect to the average value of the given monomial on the feasible set F. It turns out that the weaker constraints in (P’) compared to (P) only have a minor effect on the optimal underestimator. One of the few cases where the optimal solution of (P’) is ( , ) = (− , 1 ) ≥ infeasible for (P) is x1x2 for binary x1 and ternary x2:for x1 x2 1 2 ,wehavex1x2 − 2 x2 ;seeFig.4. The change in the objective function has a slightly stronger effect, but altogether the improvements are negligible; in the pure binary case, the maximal percentage of gap closed is about 3%. We can thus conclude that using optimal solutions according to (P) is a reasonable approach also for integer variables with small domains.

3 Branch-and-bound algorithm

We propose to solve (MIPO) by a branch-and-bound algorithm in which the separable under- estimators presented in Sect. 2 are used to compute lower bounds. We branch by splitting up the domain of a variable and generate two subproblems. The aim of branching is to make the feasible box smaller and to improve the bound given by the separable underestimator. This can be achieved either because the separable underestimator changes or just because the feasible set is smaller. All further features of our algorithm are straightforward: the branching rule simply consists in choosing a variable xi with largest domain, i.e., with maximal ui −li . This domain is divided in the middle. More sophisticated branching strategies did not lead to any improvements. Our exploration strategy is depth first, again no improvements were obtained by more refined strategies. Note that, as mentioned before, we do not relax the integrality requirement when mini- mizing the separable underestimator. At each node of the branch-and-bound we hence get not only a lower bound but also an upper bound, i.e., a feasible solution for our original prob- lem (MIPO). All we need is to re-evaluate the objective function of (MIPO) at the minimizer of the separable underestimator. No further primal heuristic has been implemented.

4 Experimental results

For testing our algorithm, we implemented the proposed branch-and-bound scheme, which we will call PolyOpt, in C++. All experiments were run on a cluster of 64-bit Intel(R) Xeon(R) E5-2670 processors running at 2.60GHz with 20 MB cache. As for the instances, we randomly generated continuous, mixed, and integer instances with polynomials f of degree d = 4. We used the following parameters: – The number of variables is n ∈{10, 15, 20, 25, 30, 35, 40}. – For the bounds on the variables we consider

– [li , ui ]=[−10, 10] for continuous, mixed-integer, and integer instances, – [li , ui ]=[−1, 1] for integer (i.e., ternary) instances, and – [li , ui ]=[0, 1] for integer (i.e., binary) instances, for all i = 1,...,n in each case. 123 J Glob Optim 3% 27% 17% 20% ≈ ≈ ≈ 15 15 28 32 15 √ √ 4 9 9 √ + + + 3 3 2 2 1 1 6 –– − –– –– –– − − ) ) 2 2 2 3 x x + + 2 2 1 2 x x ( ( 2 2 2 1 1 2 1 3 1 1 2 2 x x x x x 1 1–– 1–– − − − − − − − − − − 3% 27% 17% 20% ≈ ≈ ≈ 15 15 28 32 15 √ √ 4 9 9 √ + + + 3 3 2 2 1 1 6 –– − –– –– –– − Absolute Gap Absolute Gap − ) ) 2 2 2 3 x x 4, binary and ternary variables + + ≤ 2 2 1 2 x x d ( ( 2 2 2 3 1 1 2 2 x x 1–– 1 2 1 1 1 x x x − − − Optimal in (P’) Improvement Optimal in (P’) Improvement +− − − − 4 x 3 x 2 x 1 x T TT BBT BTT B BB T BBB BT Optimal separable underestimators for monomials of degree 3 x 2 2 x x 2 1 1 1 1 x x x x 11–– Table 2 Monomial Binary/ternary ± ± ± ± ± 123 J Glob Optim 41% 3% 27% 11% 11% 30% ≈ ≈ ≈ 20% ≈ 20% ≈ ≈ 15 8 √ 3 + 3 2 0.23 0.14 1 1 1 1 4 1 6 12 6 12 –– –– ≈ − ≈ –– ) 2 4 x + ) ) 2 2 2 3 3 4 x x x + + + ) ) ) ) 2 2 2 2 2 2 2 4 3 2 2 2 3 1 x x x x x x x − + + + + + + + 2 2 2 2 2 2 2 2 2 3 2 1 1 1 2 x x x x x x x x ( ( ( ( ( ( ( 2 2 2 4 1 3 − 1 1 1 1 1 1 1 3 2 2 3 4 2 2 x x x 1–– 2 1 x − − − − − − − − − − − 3% 27% 11% 11% 30% 41% ≈ ≈ 20% ≈ 20% ≈ 75% ≈ ≈ 15 8 √ 3 + 3 2 0.14 0.23 1 1 1 1 4 1 6 12 6 12 1 3 –– − Absolute Gap Absolute Gap ≈ –– ≈ ) 2 4 x + ) ) 2 2 2 3 3 4 x x x + + + ) ) ) 2 2 2 2 2 4 3 2 2 2 3 1 1 x x x x x x − − − + + + + + 2 2 2 2 2 2 2 2 2 2 3 2 1 1 2 x x x x x x x x ( ( ( ( ( ( 2 2 2 4 1 3 + + 1 1 1 1 1 3 2 2 1 3 4 2 x x x 1–– 2 2 1 1 x x − − − − − − − − − − +− Optimal in (P’) Improvement Optimal in (P’) Improvement 4 x 3 x 2 x 1 x TT BTTT TBT BBTT TTTT TTT TTT TB BBBB TBB TT BBBT 4 continued x 3 3 x x 2 2 2 2 2 x x x x 2 2 2 1 1 1 1 x x x x Table 2 Monomial Binary/ternary ± ± ± ± 123 J Glob Optim

1 0.5 0 -0.5 -1 -1.5 1 0.5 -1 0 -0.5 0 -0.5 0.5 1 -1 ( ) = − 2 Fig. 4 The optimal separable underestimator for f x x1x2 for binary x1 and ternary x2 is x2

– The number m of monomials appearing in f is either a multiple of n or covers all possible monomials of degree ≤ 4 (complete instances). – The coefficients of each monomial were chosen uniformly at random in the interval [−1, 1]. – The variables composing each monomial and the corresponding degree are computed by randomly generating four values within the variables index set plus 0, i.e., {0, 1,...,n}. Each time value 0 is generated, the degree of the monomial decreases by one. The number of repetitions of the same index thus determines the degree of the corresponding variable.

We generated 10 different instances for each possible combination of the values above so as to show an average behavior. All instances are available for download at http://www. mathematik.tu-dortmund.de/lsv/instances/mipo.tar.gz. PolyOpt’s performance was compared against four general MINLP solvers, namely ANTIGONE 24.7.1 [18], BARON 12.7.7 [23], Couenne 0.4 [1,11], and SCIP 3.0.1 [21], as well as a solver for continuous polynomial optimization, namely GloptiPoly 3.6.1 [15]. We selected these solvers because they are capable of solving problems of type (MIPO) with any degree d, domain size, and variable type. As GloptiPoly has been developed for contin- uous polynomial optimization problems, we modeled the variables’ integrality by adding a polynomial constraint, as GloptiPoly developers suggested. The only option we set for all the methods was a time limit of one CPU hour. Note that all these solvers have been developed for more general classes of problems, in particular all of them allow non-trivial constraints. However, to the best of our knowledge, no other exact methods capable of solving problem (MIPO) have been proposed in the literature. Recently, Dua [14] devised an approach that is tailored for mixed-integer polynomial opti- mization problems. However, it requires to enumerate all possible combinations of values for integer variables, so that the approach is not feasible when the problem contains too many integer variables or integer variables with large domains. Tables 3–5 show results on pure integer instances, while results on continuous and mixed- integer instances are presented in Table 6 and 7, respectively. For each table, the first column represents n, the number of variables, and the second m, the number of monomials. The first block of rows after the header lists the results for complete instances. Each of the columns three to nine correspond to one of the compared approaches, namely SCIP,Couenne, BARON, ANTIGONE, GloptiPoly and PolyOpt in two versions, namely the one considering the global underestimator of Sect. 2.2 (PolyOpt-su) and the one considering the optimal underestimators 123 J Glob Optim , integer variables ] 10 , 10 Scip Couenne Baron Antigone Gloptipoly Polyopt-su Polyopt [− Results for bounds Table 3 nm 101010 10011010 1010 20 ***10 3010 40 0.15 (10)10 50 0.79 (10)10 60 3.61 (10)10 70 31.78 *** (10) 0.0615 (10) 80 70.25 ( 9) 4.9715 (10) 90 291.51 ( 7) 100 370.7115 ( 692.92 8) ( 6) 109.19 ( 4)15 736.56 ( 15 4) 0.21 (10) 52.58 ( 989.00 1) *** ( 30 4) 178.86 0.83 ( 677.50 (10) 2) ( 2) 2.84 45 (10) 1165.88 ( 2) 18.24 (10) 60 0.24 1085.39 (10) ( 3) 0.06 11.05 (10) *** (10) 33.02 (10) 51.94 (10) 1526.85 0.33 ( 234.25 (10) 1) ( 74.14 9) (10) 1.63 *** (10) 10.95 1203.34 (10) ( 293.39 7) (10) 0.22 (10) 82.25 *** 35.51 ( (10) 9) 40.03 405.48 (10) (10) 1887.84 *** 54.84 ( (10) 2) *** 177.17 *** *** 327.99 (10) (10) 0.44 *** (10) *** 478.06 (10) *** 6.91 (10) 113.19 (10) *** 204.07 1.38 *** (10) (10) 0.99 (10) 1.34 3.48 *** (10) (10) 523.03 0.07 (10) (10) 76.04 (10) 2.13 (10) *** 246.83 (10) 4.32 (10) 0.20 4.21 (10) (10) 4.77 (10) 0.10 9.14 (10) (10) 393.17 (10) 0.13 0.45 (10) (10) *** *** 32.42 (10) *** 11.48 (10) 0.63 (10) 0.61 (10) 8.57 (10) *** 0.76 (10) 1.32 (10) 1.39 (10) 31.46 181.52 (10) (10) 93.47 (10) 1.08 (10) 551.52 (10) 4.99 (10) 0.28 (10) 1.48 (10) 17.36 (10) 123 J Glob Optim Scip Couenne baron Antigone Gloptipoly Polyopt-su Polyopt continued Table 3 nm 15151515 7515 9015 105 120 1371.5920 ( 4) 135 ***20 *** 15020 *** ***20 20 ***20 40 ***20 60 ***20 0.69 80 (10) ***20 100 307.14 (10) *** 985.78 ( 9)20 120 3593.83 ( *** 1)20 140 *** *** *** 0.37 160 (10) 707.38 ( 1285.80 3) ( 3) *** 761.12 180 (10) 2958.22 *** ( 1) *** 200 898.22 ( 1) *** *** 0.99 102.92 (10) (10) 909.77 ( 5) *** *** *** 2389.72 *** ( 3) *** *** 524.57 ( 1) *** 1582.92 ( 6) 17.59 (10) *** *** 0.30 (10) *** *** *** 581.15 (10) *** *** *** 613.38 (10) *** *** *** *** *** *** 1059.66 *** (10) 16.42 1525.17 (10) (10) *** *** *** 1778.03 *** (10) 1411.09 ( *** 5) 3172.81 38.11 *** ( (10) *** 2) 51.27 (10) 1921.78 (10) *** 54.01 *** (10) *** *** *** 45.59 2044.54 (10) ( 6) 3.43 (10) *** 2605.63 ( 7) *** *** *** 90.37 (10) *** *** 258.83 (10) 115.98 (10) *** *** *** *** *** *** 820.18 (10) *** *** *** 1370.54 (10) 2057.88 ( 9) 2370.40 ( 6) 2902.04 ( 4) 2066.20 ( 1) 2496.25 ( 1) 123 J Glob Optim , integer variables ] 1 , 1 [− Scip Couenne baron Antigone Gloptipoly Polyopt-su Polyopt Results for bounds Table 4 nm 101520 100120 387620 ***20 20 ***20 4020 60 0.6020 (10) 80 100 12.9020 (10) 125.68 120 104.69 (20 (10) 8) *** 140 411.5920 (10) 0.27 1078.26 (10) (10) 160 4.9125 (10) 2042.38 ( 180 8) 34.73 (10) 639.95 (10)25 2557.72 ( 200 1) 137.07 (10) 128.0525 (10) *** 0.5025 (10) 25 *** 259.23 (10) *** 11.98 (10) 1298.36 76.88 (10)25 (10) 50 *** 330.48 (10) 270.80 (10) 610.8825 (10) 75 100 1.0925 (10) 1096.55 (10) 2.53 0.20 (10) (10) 125 68.12 10.0525 (10) (10) 366.45 36.86 1613.19 (10) (10) ( 8) 328.93 (10) 150 857.95 530.0725 (10) (10) 979.21 (10) *** 2160.92 ( 175 4) 935.6225 (10) 1218.90 (10) 0.67 *** (10) 200 1943.67 9.24 2290.54 37.71 2563.91 1485.20 ( (10) ( (10) (10) ( 6) 7) *** 2805.90 9) 2722.31 ( ( 225 526.16 8) 3) 3062.31 (10) ( *** 4) *** 921.11 (10) *** 250 2842.35 ( 5) *** 304.37 (10) 1813.67 1.07 115.01 136.18 2635.04 ( (10) (10) ( ( 9) 131.31 *** 8) 800.97 5) 1530.60 (10) (10) (10) 1.04 1182.56 2251.11 (10) ( ( 9) 9) 1022.03 *** 1558.19 ( ( 9) 4) 2555.82 ( 7) 1959.41 ( 6) 3.07 2390.98 (10) 2312.47 0.47 ( 3355.58 26.65 ( (10) 6) ( (10) 2647.16 6) 25.83 *** 10.06 4) ( (10) (10) 2811.76 1) 0.70 429.84 ( 32.44 (10) (10) 5) (10) *** *** 1112.75 ( 8) 43.57 (10) *** *** 2464.21 *** ( 55.12 3) (10) *** *** 3038.32 *** *** ( 3) *** *** 356.12 (10) *** *** 2127.91 ( 4) *** *** 2218.05 ( 2) 68.53 (10) *** *** *** *** *** 2980.63 *** ( 3) *** 81.57 (10) *** 98.58 (10) *** 2.10 (10) *** *** 43.14 210.34 (10) (10) *** 535.91 (10) *** *** *** *** *** 758.62 (10) 1037.67 (10) *** *** *** 1841.68 (10) 2443.04 (10) 2656.76 ( 8) 3082.51 ( 5) 123 J Glob Optim of Sect. 2.4 (PolyOpt), respectively. At each entry we find the average CPU time needed to solve one instance to global optimality while in parentheses we state how many instances were solved out of 10. Whenever a solver is not able to solve any of the 10 instances within the time limit, we report it as ***. If only a part of the instances was solved, the given average running time refers only to the solved instances. Note that GloptiPoly returned several times the message “cannot guarantee optimality”. In this case, we did not consider the instance as solved. In Table 3 results for instances with integer variables in the box [−10, 10] areshown.We first note that GloptiPoly cannot solve any instance, not even for n = 10. It is clearly due to the high-degree polynomial constraints needed to model the integrality requirements. The method that performs best in terms of solved instances is PolyOpt with 281 solved instances out of 310. SCIP, Couenne, BARON, ANTIGONE, and PolyOpt-su can solve 133, 75, 180, 194, and 215 of them, respectively. Having a closer look we can observe that the denser the instances are, i.e., the larger the number of monomials is, the better is PolyOpt’s performance with respect to the other methods. In fact, it is the only method that, in both versions, could solve complete instances on 10 variables. Complete instances for n = 15 could not be solved by any of the methods. Table 4 shows the results for pure integer instances with bounds [−1, 1]. We generated these instances so as to test the influence of the size of the variable domains on the performance of the different methods. These instances turn out to be slightly easier than the previous ones: the complete instances with n = 10 are solved by PolyOpt, PolyOpt-su, BARON, ANTIGONE, and GloptiPoly, while Couenne solves 8 out of 10 instances. For n = 15, only PolyOpt solves all the complete instances; GloptiPoly can solve 8 out of 10, the other methods cannot solve any instance. However, PolyOpt is again the clear winner on number of solved instances with 213 out of 220. GloptiPoly performs better here with 72 instances while PolyOpt-su, SCIP, Couenne, BARON, and ANTIGONE can solve 73, 93, 164, 117, and 143, respectively. The results confirm the performance of PolyOpt being the best also on smaller variables domains. Moreover, besides the sparser instances, PolyOpt outperforms the other methods also if we consider the average CPU time. Next we show the results for binary variables (Table 5). For these instances, we go up to n = 25 for the complete instance, to n = 40 for the others. In this case, SCIP, Couenne, and ANTIGONE outperform the other methods both in terms of solved instances and of average CPU time, the only exception being the complete instances. For these, only Glop- tiPoly and PolyOpt could solve all instances up to n = 20 but the latter shows better average CPU times and could also solve some of the instances for n = 25. This behav- ior can be explained as follows: SCIP and Couenne both have specific heuristics and cuts to deal with binary variables while PolyOpt does not. The only special feature we imple- 2 = mented in PolyOpt was to exploit the fact that xi xi in the binary case, so that we could e ≥ replace each xi with xi for any e 2, potentially lowering the degree of the monomials. Note that this feature was not enough to make PolyOpt-su competitive in the binary case: both in the previous tables and especially here it is clear that the performance of PolyOpt changes significantly when considering the optimal underestimators instead of the global ones. We also tested the case of bounds in [−10, 10] with continuous or mixed-integer instances, see Tables 6 and 7, respectively. As expected, the continuous instances happen to be easier to solve by GloptiPoly. SCIP shows the weakest performance, while ANTIGONE outperforms all other approaches on large instances, however showing a much more volatile behaviour in comparison to integer instances. The performances on the mixed-integer instances are closer to the ones seen in the pure integer case, with the exception of Couenne that performs better. 123 J Glob Optim , integer variables ] 1 , 0 Scip Couenne Baron Antigone Gloptipoly Polyopt-su Polyopt [ Results for bounds Table 5 nm 101520 100125 387620 1062620 23751 8.75 (10)20 245.62 (10)20 *** 2020 *** 40 16.9820 (10) 442.50 (10) 6020 0.04 (10) 8020 100 0.23 (10) *** 30.08 2377.4120 (10) (10) 120 0.47 (10) ***20 140 0.90 (10) 0.01 (10) 1.51 (10) 16030 617.18 0.12 (10) (10) 2.09 (10) 18030 15.42 (10) 0.32 (10) 3.29 (10) 20030 *** 0.70 (10) 4.05 (10) 0.16 (10) *** 1.42 105.40 (10) (10) 30 5.00 (10) 0.72 (10) 1.95 2.38 (10) (10) 60 5.90 (10) 1.99 (10) 2.70 (10) 90 5.08 (10) 3.84 35.61 (10) 0.04 (10) (10) 8.23 0.09 (10) (10) *** 5.02 (10) 0.47 (10) 13.13 0.15 (10) (10) 0.25 (10) *** 5.59 (10) 1.43 (10) 19.14 0.48 (10) (10) 29.20 0.97 1.81 (10) 0.03 (10) (10) (10) 3426.62 ( 1.82 1) (10) 41.31 (10) 0.60 (10) 3.09 2871.88 (10) ( 9) 51.77 0.05 (10) 1.73 2520.86 (10) (10) (10) 3.45 2592.85 (10) (10) *** 4.16 2842.80 (10) 0.92 0.39 (10) (10) (10) 2751.33 (10) 5.65 (10) 2.50 4.99 (10) (10) 2493.54 2699.26 (10) ( 5.39 1) 5.50 (10) (10) 21.01 (10) 2490.07 (10) 5.78 (10) 2591.12 13.70 (10) (10) 0.02 0.15 (10) (10) 14.50 (10) 2350.67 ( 101.87 2) *** (10) 0.08 1.34 (10) (10) 21.86 (10) *** 0.17 4.48 (10) (10) 22.61 (10) 0.14 (10) 0.36 (10) *** 37.77 0.40 (10) (10) *** 0.44 (10) *** 0.52 (10) 1557.26 ( 7) 31.95 (10) 0.75 (10) 370.58 (10) 1152.81 (10) 0.61 (10) 1521.74 ( 9) 1.29 (10) 6.88 (10) 8.35 (10) 123 J Glob Optim Scip Couenne Baron Antigone Gloptipoly Polyopt-su Polyopt continued Table 5 nm 30303030 12030 15030 18030 3.39 210 (10) 5.3840 240 (10) 7.1640 270 (10) 9.9740 300 (10) 4.49 (10) 14.1040 (10) 6.25 (10) 40 19.0740 (10) 11.11 (10) 80 28.1540 (10) 120 14.03 (10) 65.1240 19.86 (10) (10) 160 0.09 (10) 91.0140 28.43 (10) (10) 200 0.79 189.31 (10) (10)40 42.75 4.00 (10) 240 (10) 318.89 (10)40 7.22 671.06 280 (10) (10) 6.52 (10) 0.03 12.91 (10) 971.77 320 (10) (10) 9.48 (10) 14.24 (10) 1.08 21.88 (10) 1366.81 360 (10) ( 9) 6.41 18.28 (10) (10) 47.34 400 (10) 43.74 (10) 11.59 (10) *** 90.62 (10) 23.62 60.62 (10) 0.49 (10) (10) *** *** 120.58 (10) 41.51 122.67 (10) 18.23 (10) (10) *** 239.34 152.10 (10) 94.52 (10) (10) *** 471.81 (10) 178.36 (10) 1337.27 *** (10) 217.36 (10) *** 0.14 2207.89 (10) 2377.69 ( ( 3) 8) 388.54 2.94 (10) (10) 37.40 2784.61 3264.82 1862.72 (10) ( ( ( 3) 2) 1) 30.65 *** (10) 37.76 2258.73 (10) ( 2) *** *** 19.06 (10) 120.56 *** *** (10) *** 13.77 24.14 *** *** (10) (10) 314.62 *** (10) *** 22.23 *** (10) *** 515.59 (10) *** 650.61 32.17 (10) (10) *** 853.28 29.10 *** *** ( (10) 9) 66.97 *** (10) *** *** *** *** *** *** 67.32 ( 9) 95.94 667.56 (10) (10) *** 585.47 ( 9) 814.77 (10) *** 801.93 ( 8) *** 1196.28 ( 7) 1219.47 ( 5) 1700.76 ( 9) 2163.80 ( 7) 123 J Glob Optim , continuous variables ] 10 , 10 Scip Couenne Baron Antigone Gloptipoly Polyopt-su Polyopt [− Results for bounds Table 6 nm 101010 10011010 1010 *** 2010 3010 0.14 40 (10)10 3.96 50 (10)10 491.81 60 ( 9)10 552.12 *** 70 ( 3) 0.13 (10) 1871.9615 80 ( 1) 18.37 (10) ***15 90 8.96 ( 7) 100 ***15 172.32 ( 4) *** 67.7615 ( 1) 0.17 15 ( 9) *** ***15 0.94 (10) 30 ***15 3.08 45 (10) 31.22 (10) 1958.36 ( 3) 0.26 60 (10) 80.33 (10) *** 503.58 75 0.09 (10) (10) 0.48 (10) 2391.40 ( 2) 541.95 90 ( 1) *** *** 51.22 4.72 8.82 (10) (10) (10) *** *** 0.20 11.77 (10) ( *** 14.60 8) ( 9) 423.64 2.94 ( ( *** 9) 6) 2.46 ( 336.24 8) ( 93.70 1) ( 9) 19.38 (10) 2.55 2.84 ( ( 8) 9) 0.36 3.00 3.20 (10) 236.71 ( ( 5.91 (10) 9) 6) (10) *** 119.26 (10) 662.43 (10) 291.64 (10) 222.99 31.12 ( *** (10) 9) 34.32 2.51 (10) ( 9) *** 2.79 11.63 (10) (10) 82.39 (10) 2.83 0.03 11.58 548.52 ( (10) 184.17 (10) (10) 8) 2.74 (10) 17.67 (10) (10) 25.14 19.22 (10) (10) 2.72 ( 538.11 7) ( 10.57 9) (10) 0.94 (10) 0.53 1212.09 2.76 (10) ( ( 8) 86.44 8) 2.69 (10) ( 1.67 6) (10) 24.09 109.17 1815.35 (10) ( ( 8) 118.25 3) ( 9) 115.78 ( 9) 101.81 12.27 ( 1.32 (10) 7) (10) 173.26 ( 9) 20.42 172.21 (10) 1044.97 ( 28.72 ( 9) 2.94 309.64 (10) 9) (10) (10) 692.44 (10) 119.15 1.84 ( (10) 7) 102.97 ( 8) 131.68 2.21 ( 137.80 2.73 (10) 8) (10) (10) 4.95 8.58 (10) (10) 1859.56 ( 9) 2045.09 ( 9) 2346.23 ( 5) 37.27 (10) 28.27 (10) 72.15 (10) 123 J Glob Optim Scip Couenne Baron Antigone Gloptipoly Polyopt-su Polyopt continued Table 6 nm 15151515 105 12020 13520 *** 15020 ***20 20 ***20 40 ***20 6020 *** 0.37 80 (10)20 100 *** 874.37 ( 3)20 120 *** ***20 140 *** *** 0.38 (10) *** 16025 205.05 ( 3) 3055.45 *** 180 ( 2)25 1227.11 *** 200 ( 1)25 *** *** *** 0.98 (10)25 117.05 (10) 25 *** *** *** 1122.93 (25 6) *** 50 *** 1647.83 (25 4) *** 7525 100 13.32 *** 7.73 ( (10) 0.07 9) (10) 112.78 (25 125 8) 1418.35 *** *** ( 1874.82 5) 112.77 ( ( 1)25 150 8) *** *** *** 2655.76 ( 2) *** ***25 175 *** 0.90 (10) 2600.67 *** *** 200 ( 2621.26 2403.32 8) ( 45.79 ( 2) ( 7) 105.31 9) ( *** *** 9) 225 2823.28 ( 1) 119.62 ( *** *** *** 8) 250 *** *** *** *** 2.92 99.17 1102.87 (10) 2858.28 (10) ( ( 7) 1) *** 364.94 *** *** *** 2994.37 ( 110.90 ( 7) (10) 6) *** 2346.18 *** 2642.26 ( ( 1) 1) *** 2.35 ( 1) 2375.94 503.30 ( ( 544.41 *** 8) 7) ( 0.75 *** 8) (10) *** 2510.71 ( *** *** 5) 3.34 224.48 2451.53 ( (10) 232.10 ( 1) *** (10) 9) *** *** 172.77 (10) *** *** *** 2545.19 12.05 ( (10) *** 5) *** *** *** *** 2232.10 2658.68 ( 454.73 ( 7) (10) *** 6) 782.58 (10) *** 1.18 *** *** ( 4) 2191.13 ( 5) 3.71 *** 1263.02 ( ( 4) 9) *** *** *** 2036.37 *** ( *** 6) 1772.69 ( *** 3) *** 2.76 *** ( 2) 2838.28 ( *** *** 2) *** *** *** *** *** 822.27 *** ( 6) *** *** *** *** *** *** 1272.61 ( 9) *** *** 1958.19 ( 1) *** *** *** *** *** *** *** *** *** *** *** 123 J Glob Optim , mixed-integer variables ] 10 , 10 Scip Couenne Baron Antigone Gloptipoly Polyopt-su Polyopt [− Results for bounds Table 7 nm 101010 10011010 1010 *** 2010 3010 0.33 40 (10)10 1.43 50 (10)10 22.05 60 (10)10 182.85 *** 70 ( 9) 0.08 (10)15 774.88 80 ( 9) 0.44 (10)15 447.96 90 ( 5) 0.81 (10) 10015 679.35 ( 4) 4.56 (10)15 1574.58 ( 1) 1.34 15 16.26 (10) (10) ***15 3474.68 ( 1) 0.89 30 22.29 (10) (10) ***15 3.08 (10) 45 11.25 (10)15 26.89 74.86 (10) (10) 0.19 60 (10) 54.59 (10) 0.08 72.41 (10) (10) 112.57 75 (10) 53.64 (10) 0.27 (10) 318.58 90 ( 7) 126.53 1.53 (10) *** (10) 105 879.22 577.44 10.92 ( (10) (10) 2) 51.78 ( 9) 0.15 111.20 (10) (10) *** 263.55 1.98 (10) (10) *** 45.43 (10) *** 10.49 66.19 *** ( (10) *** 9) 375.12 *** (10) 117.01 *** (10) *** 567.50 (10) 245.07 (10) 0.44 (10) *** 6.67 (10) *** *** 318.52 125.61 118.71 (10) (10) *** (10) 556.31 (10) 301.10 599.62 13.37 ( (10) *** (10) 6) 495.67 2.01 ( (10) 6) 0.06 5.87 (10) (10) 34.94 6.39 (10) (10) 967.85 ( 136.26 321.29 8) (10) (10) 0.55 424.99 (10) ( 9) 5.87 (10) *** 1440.39 ( 7.49 3) (10) 0.24 13.08 (10) *** (10) 0.16 *** 12.64 *** (10) (10) 0.60 *** (10) 900.48 0.87 47.30 (10) (10) (10) *** 1424.29 ( 7) 0.75 (10) 1.06 1.72 (10) (10) 16.15 (10) 1.41 *** (10) 2715.01 ( 2) 157.22 *** 177.79 ( (10) 9) 328.15 (10) 1043.49 (10) 1.86 (10) *** 1.86 (10) 21.99 1203.55 6.48 (10) (10) (10) 23.74 (10) 1850.06 ( 9) 21.89 (10) 2302.83 ( 5) 50.03 (10) 69.93 (10) 123 J Glob Optim Scip Couenne Baron Antigone Gloptipoly Polyopt-su Polyopt continued Table 7 nm 151515 12020 13520 15020 ***20 20 ***20 40 ***20 6020 0.51 80 (10)20 100 707.09 ( 586.83 8) ( 6)20 120 *** 1673.22 ( 4)20 140 *** 1614.30 ( 2) 0.33 *** 160 (10)25 12.50 (10) 952.72 *** ( 180 1)25 *** *** 20025 *** *** 383.20 (10)25 0.86 25 (10) *** 113.98 (10) 1141.59 211.40 ( (25 7) 1) 50 1110.76 *** ( 2)25 75 1984.69 ( 1) 1842.67 (25 100 *** 6) 2.15 (10) *** 64.68 (10) *** 0.1925 125 (10) *** *** 1233.17 ( 1223.26 *** 1) ( 1)25 150 *** *** *** 493.66 *** (10)25 175 *** 0.80 *** *** 200 (10) 51.20 (10) *** *** *** *** 225 *** 2590.06 1336.99 *** ( *** ( *** 250 5) 5) *** *** 1237.38 ( *** *** 6) 2.92 606.13 (10) ( 1167.83 *** 9) ( 1) *** *** 75.39 *** *** *** (10) 2741.94 ( *** 3) 2862.28 ( *** 1) 3378.23 ( *** *** 1) 1410.14 ( *** *** 1) 36.59 (10) 1.36 *** (10) *** 457.89 (10) *** *** 123.87 (10) *** 93.33 *** ( *** 55.10 5) (10) 158.25 (10) *** *** *** *** *** *** *** 336.41 (10) *** 1215.05 *** ( 2) *** *** *** *** *** 1114.29 *** (10) *** *** *** *** *** *** *** *** *** 1718.24 ( 9) *** *** *** 2055.60 ( 6) *** *** *** *** *** 1067.16 3321.33 (10) ( 1) 67.00 ( *** 9) 1857.18 ( 2) *** 1544.15 ( *** 1) 3115.31 ( 1) *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** 123 J Glob Optim

In both cases the performance of PolyOpt-su is always dominated by the one of PolyOpt. In summary, Tables 6 and 7 confirm that PolyOpt outperforms the other solvers in most cases, the main exception being the binary instances.

5 Conclusions and future directions

In this paper we presented a new method, called PolyOpt, for solving box-constrained polyno- mial minimization problems. It consists of a branch-and-bound scheme, where lower bounds are calculated by minimizing separable underestimators. These underestimators make the method effective as they are quickly computed and provide a decent approximation of the original problem. Extensive computational results confirm the effectiveness of PolyOpt. An interesting future direction could be to extend our approach to problems featuring non-trivial constraints. A straightforward idea would be to apply Lagrangian relaxation. If the problem contains only linear or, more generally, polynomial constraints, the additional Lagrangian term in the objective function would not compromise our method to solve the problem. Anyhow, the update of Lagrangian multipliers is not a trivial task to accomplish and a sophisticated stabilization technique would be needed.

Acknowledgments The work is partially supported by the EU grant FP7-PEOPLE-2012—ITN No. 316647 “Mixed-Integer Nonlinear Optimization”. The second author gratefully acknowledges the partial financial support under Grant ANR 12-JS02-009-01 “ATOMIC”.

References

1. Belotti, P., Lee, J., Liberti, L., Margot, F., Wächter, A.: Branching and bounds tightening techniques for non-convex MINLP. Optim. Methods Softw. 24, 597–634 (2009) 2. Belotti, P., Kirches, C., Leyffer, S., Linderoth, J., Luedtke, J., Mahajan, A.: Mixed-integer nonlinear optimization. Acta Numer. 22, 1–131 (2013) 3. Billionnet, A., Elloumi, S., Lambert, A.: Extending the QCR method to general mixed integer programs. Math. Progr. 131(1), 381–401 (2012) 4. Buchheim, C., D’Ambrosio, C.: Box-constrained mixed-integer polynomial optimization using separable underestimators. In: Integer Programming and Combinatorial Optimization—17th International Confer- ence, IPCO 2014, LNCS, vol. 8494, pp. 198–209 (2014) 5. Buchheim, C., Rinaldi, G.: Efficient reduction of polynomial zero-one optimization to the quadratic case. SIAM J. Optim. 18(4), 1398–1413 (2007). doi:10.1137/050646500 6. Buchheim, C., Traversi, E.: Separable non-convex underestimators for binary quadratic programming. In: 12th International Symposium on Experimental Algorithms—SEA 2013, LNCS, vol. 7933, pp. 236–247 (2013) 7. Buchheim, C., Wiegele, A.: Semidefinite relaxations for non-convex quadratic mixed-integer program- ming. Math. Progr. 141(1–2), 435–452 (2013). doi:10.1007/s10107-012-0534-y 8. Buchheim, C., De Santis, M., Palagi, L., Piacentini, M.: An exact algorithm for nonconvex quadratic integer minimization using ellipsoidal relaxations. SIAM J. Optim. 23(3), 1867–1889 (2013) 9. Burer, S.: Optimizing a polyhedral-semidefinite relaxation of completely positive programs. Math. Progr. Comput. 2(1), 1–19 (2010) 10. Burer, S., Letchford, A.N.: Non-convex mixed-integer nonlinear programming: a survey. Surv. Oper. Res. Manag. Sci. 17, 97–106 (2012) 11. COUENNE (v. 0.4) projects.coin-or.org/Couenne 12. D’Ambrosio, C., Lodi, A.: Mixed integer nonlinear programming tools: a practical overview. 4OR 9(4), 329–349 (2011) 13. D’Ambrosio, C., Lodi, A.: Mixed integer nonlinear programming tools: an updated practical overview. Annals OR 204(1), 301–320 (2013) 14. Dua, V.: Mixed integer polynomial programming. Comput. Chem. Eng. 72, 387–394 (2015)

123 J Glob Optim

15. Henrion, D., Lasserre, J.B., Loefberg, J.: Gloptipoly 3: moments, optimization and semidefinite program- ming. Optim. Methods Softw. 24, 761–779 (2009) 16. Lasserre, J.B.: Convergent SDP-relaxations in polynomial optimization with sparsity. SIAM J. Optim. 17, 822–843 (2006) 17. Lasserre, J.B., Thanh, T.P.: Convex underestimators of polynomials. J. Glob Optim. 56, 1–25 (2013) 18. Misener, R., Floudas, C.A.: ANTIGONE: algorithms for continuous/integer global optimization of non- linear equations. J. Glob. Optim. 59(2), 503–526 (2014) 19. Parrilo, P.A.,Sturmfels, B.: Minimizing polynomial functions. Algorithmic and quantitative real algebraic geometry. DIMACS Ser. Discrete Math. Theor. Comput. Sci. 60, 83–99 (2001) 20. Rosenberg, I.G.: Reduction of bivalent maximization to the quadratic case. Cahiers du Centre d’Etudes de Recherche Opérationelle 17, 71–74 (1975) 21. SCIP (v. 3.0.1) http://scip.zib.de/scip.shtml 22. Shor, N.Z.: Class of global minimum bounds of polynomial functions. Cybernetics 23, 731–734 (1987) 23. Tawarmalani, M., Sahinidis, N.V.: A polyhedral branch-and-cut approach to global optimization. Math. Progr. 103, 225–249 (2005) 24. Vandenbussche, D., Nemhauser, G.L.: A branch-and-cut algorithm for nonconvex quadratic programs with box constraints. Math. Progr. 102(3), 559–575 (2005)

123 Math. Program., Ser. B (2012) 136:375–402 DOI 10.1007/s10107-012-0608-x

FULL LENGTH PAPER

A storm of feasibility pumps for nonconvex MINLP

Claudia D’Ambrosio · Antonio Frangioni · Leo Liberti · Andrea Lodi

Received: 14 August 2010 / Accepted: 12 February 2011 / Published online: 2 November 2012 © Springer-Verlag Berlin Heidelberg and Mathematical Optimization Society 2012

Abstract One of the foremost difficulties in solving Mixed-Integer Nonlinear Programs, either with exact or heuristic methods, is to find a feasible point. We address this issue with a new feasibility pump algorithm tailored for nonconvex Mixed-Integer Nonlinear Programs. Feasibility pumps are algorithms that iterate between solving a continuous relaxation and a mixed-integer relaxation of the original problems. Such approaches currently exist in the literature for Mixed-Integer Linear Programs and convex Mixed-Integer Nonlinear Programs: both cases exhibit the distinctive prop- erty that the continuous relaxation can be solved in polynomial time. In nonconvex Mixed-Integer Nonlinear Programming such a property does not hold, and therefore special care has to be exercised in order to allow feasibility pump algorithms to rely only on local optima of the continuous relaxation. Based on a new, high level view of feasibility pump algorithms as a special case of the well-known successive projection method, we show that many possible different variants of the approach can be devel- oped, depending on how several different (orthogonal) implementation choices are

This paper extends [16].

C. D’Ambrosio · L. Liberti LIX, Ecole Polytechnique, Paris, France e-mail: [email protected] L. Liberti e-mail: [email protected]

A. Frangioni Dipartimento di Informatica, Università di Pisa, Pisa, Italy e-mail: [email protected]

A. Lodi (B) DEI, Universitá di Bologna, Bologna, Italy e-mail: [email protected] 123 376 C. D’Ambrosio et al. taken. A remarkable twist of feasibility pump algorithms is that, unlike most previous successive projection methods from the literature, projection is “naturally” taken in two different norms in the two different subproblems. To cope with this issue while retaining the local convergence properties of standard successive projection methods we propose the introduction of appropriate norm constraints in the subproblems; these actually seem to significantly improve the practical performance of the approach. We present extensive computational results on the MINLPLib, showing the effectiveness and efficiency of our algorithm.

Keywords Feasibility pump · MINLP · Global optimization · Nonconvex NLP

Mathematics Subject Classification 90C11 Mixed integer programming · 90C26 Nonconvex programming, global optimization · 90C59 Approximation methods and heuristics

1 Introduction

Mixed-Integer Nonlinear Programming (MINLP) problems are mathematical pro- grams of the following form:

p min f (x, y) : g(x, y) ≤ 0,(x, y) ∈ X , x ∈ Z (1)   where x are integer decision variables, y are continuous decision variables, X ⊆ p+q p+q R is a polyhedron (which possibly include variable bounds), f : R → R and g p+q m : R → R . We remark that f can be assumed convex without loss of generality (if it were not, we might replace it by an added variable v and adjoin the constraint f (x) − v ≤ 0 as a further component of g). The exact solution of nonconvex MINLP is only possible for certain classes of functions f, g (e.g. if f is linear and g involve bilinear terms xy[2,11]). In general, the spatial Branch-and-Bound (sBB) algorithm is used to obtain ε-approximate solutions for a given positive constant ε. The sBB computes upper and lower bounds to the objective function value within sets belonging to an iteratively refined partition of the feasible region. The search is pruned when the lower bound on the current set is worse than the best feasible so far (the incumbent), when the problem restricted to the current set is infeasible, and when the two bounds for the current set are within ε. Otherwise, the current set is partitioned and the search continues recursively [6,35]. Heuristic approaches to solving MINLPs include Variable Neighbourhood Search [30], automatically tuned variable fixing strategies [7], Local Branching [31] and others; specifically, most exact approaches for convex MINLPs [8,21] work as heuristic for nonconvex MINLPs. In heuristic approaches, however, one of the main algorithmic difficulties connected to MINLPs is to find a feasible solution. From the worst-case complexity point of view, finding a feasible MINLP solution is as hard as finding a feasible Nonlinear Programming (NLP) solution, which is NP-hard [36]. In this paper we address the issue of MINLP feasibility by extending a well- known approach, namely the Feasibility Pump (FP) to the nonconvex MINLP case. 123 A storm of feasibility pumps for nonconvex MINLP 377

The FP algorithm was originally proposed for Mixed-Integer Linear Programming (MILP) [20], where f, g are linear forms, and then extended to convex MINLPs [8], where g are convex functions. In both cases the feasible region is partitioned so that two subproblems are iteratively solved: a problem P1 involving the continuous variables y with relaxed integer variables x, and a problem P2 involving both integer and contin- uous variables x, y targeting, through its objective function, the continuous solution of P1. The two subproblems are iteratively solved, generating sequences of values for x and y. One of the main theoretical issues in FP is to show that these sequences do not cycle, i.e., are not periodic but converge to some feasible point (x, y).This is indeed the case for the FP version proposed for convex MINLP [8] where P2 is a MILP, while cycling might happen for the original FP version proposed for MILP [20] where randomization is effectively (and cheaply) used as an escaping mechanism. In theFPforMILPs,P1 is a Linear Program (LP) and P2 a rounding phase; in the FP for convex MINLPs, P1 is a convex NLP and P2 a MILP iteratively updated with Outer Approximation (OA) constraints derived from the optimum of the convex NLP.In both cases one of the subproblems (P1) can be solved in polynomial time; in the FP for convex MINLPs, P2 is NP-hard in general. Extensions for both FPs exist, addressing solution quality in some cases [1] and CPU time in others [9]. The added difficulty in the extension proposed in this paper is that P1 is a nonconvex NLP, and is therefore NP-hard: thus, in our decomposition, both subproblems are difficult, and special care has to be exercised in order to allow FP algorithms to rely only on local optima of the continuous relaxation. A contribution of the present paper is to present FP algorithms as a special case of the well-known Successive Projection Method (SPM). By doing so we show that many possible different variants of the approach can be developed, depending on how several different (orthogonal) implementation choices are taken. A remarkable twist of FP algorithms is that, unlike most previous SPMs from the literature, projection is “naturally” taken in two different norms in P1 and P2. To cope with this issue while retaining the local convergence properties of standard SPMs we propose the introduction of appropriate norm constraints in the subproblems, an idea that could be generalized to other nonconvex applications of the SPM. In particular, adding a norm constraint to P1, besides providing nice theoretical convergence properties, actually seem to significantly improve the practical performance of the approach. The rest of this paper is organized as follows. In Sect. 2 we frame the FP algorithm within the class of Successive Projection Methods, describing their convergence prop- erties. In Sect. 3 we discuss the use of different norms within the two subproblems of the FP algorithm. In Sect. 4 we list our solution strategies for both subproblems. In Sect. 5 we present comparative computational results illustrating the efficiency of the proposed approach. Section 6 concludes the paper.

2 A view on feasibility pumps

A hitherto seemingly unremarked fact is that FP algorithms are instantiations of a more general class of algorithms, called Successive Projection Methods, for finding a point in a set intersection A ∩ B under the (informal) assumption that “optimization over 123 378 C. D’Ambrosio et al. each set separately is much easier than optimization over the intersection”. SPMs have been proposed more than 60 years ago (cf. [37]), have found innumerable applications, and have been developed in very many variants: the excellent—and not particularly recent—survey of [5] provides more than one hundred references. The basic idea of SPM is to restate the feasibility problem in terms of the optimization problem

min{z − w|z ∈ A ∧ w ∈ B}. (2)

Given an initial point w0, a SPM generates a sequence of iterates (z1,w1), (z2,w2),... defined as follows:

− zi ∈ argmin{z − wi 1|z ∈ A } (3) wi ∈ argmin{zi − w|w ∈ B}, (4) where on first reading the norm could be assumed Euclidean. It is easy to see that, as discussed below, each iteration improves the objective function of (2), so that one can hope that the sequence will converge to some fixed point (z∗,w∗) where z∗ = w∗, which therefore provides a positive answer to the feasibility problem. The standing assumption simply says that (3)–(4) are much easier problems than the whole of (2) is, and therefore that the iterative approach may make sense. While in the vast majority of the applications A and B are “easy” convex sets, and it was intended that optimization was to be exact, our nonconvex FP setting also fits under the basic assumption of the approach. Indeed, X is the coarsest relaxation of the feasible region of (1) and let C ⊆{1,...,m} be the set of constraint indices such that gi (x, y) is a convex function of (x, y) (note that these do not include the linear defining inequalities of X , if any), and N ={1,...,m} C. We denote the list of p+q all convex constraints by gC , so that C ={(x, y) | gC (x, y) ≤ 0}⊆R also is a convex relaxation of the feasible region of (1). We also denote by gN the constraints indexed by N and let N ={(x, y) | gN (x, y) ≤ 0}. We remark that deciding whether N is empty involves the solution of a nonconvex NLP and is therefore a hard problem. This hardness, by inclusion, extends to the continuous relaxation of p the feasible region P = C ∩ N ∩ X .Now,letZ ={(x, y) | x ∈ Z }, so that I = C ∩ X ∩ Z is the relaxation of the feasible region involving all the convex and integrality constraints of (1). Deciding emptiness of I involves solving a convex MINLP and is therefore also hard, but for different reasons than P. More specifically, solving nonconvex NLPs globally requires solving nonconvex NLPs locally as a sub- step, whereas solving convex MINLPs involves the solution of convex NLPs (globally) as a sub-step. The numerical difficulties linked to these two tasks are very different, in particular with respect to the reliability of finding the solution: with nonconvex NLPs, for example, Sequential Quadratic Programming (SQP) algorithms might yield an infeasibile linearization step even though the original problem is feasible. It therefore makes sense to decompose F = I ∩ P, the feasible region of (1), into its two components I and P, in order to address each of the difficulties separately. Thus, by taking e.g. A = P and B = I one can fit the FP approach under the generic SPM framework. Note that with this choice the (nonlinear) convex constraints 123 A storm of feasibility pumps for nonconvex MINLP 379 gC are included in the definition of both P and I (although they can possibly be outer- approximated in the latter, as discussed below). This makes sense since C represents, in this context, an “easy” part of (1): adding it to either set of constraints do not fundamentally change the difficulty of the corresponding problems, while clearly helping to convey as much information of F as possible. Yet, other decompositions could make sense as well. For instance, one may alternatively set B = X ∩ Z in order to keep P2 a linear problem without having to resort to outer approximation techniques (assuming that C only contains the nonlinear convex constraints, with all the linear ones represented in X ). Alternatively, B = Z could also make sense, since then P2 actually simplifies to a simple rounding operation (the choice of the original FP [20], cf. §4.2). Thus, different variants of FP for (1) can be devised which can all be interpreted as special cases of SPM for proper choices of A and B. Therefore, in the following we will keep the “abstract” notation with the generic sets A and B whenever we discuss general properties of the approach which do not depend on specific choices within the FP application. Convergence of SPMs has been heavily investigated. An easy observation is that, directly from (3)–(4),

zi−1 − wi−1≥zi − wi−1≥zi − wi ,

i i i.e., the sequence given by {δi =z − w } is nonincreasing, hence the method is at least locally convergent (in the sense that δi → δ∞ ≥ 0). Global convergence results can be obtained under several different conditions, typically requiring convexity of A and B (so that (2) is a convex feasibility problem) [5]. For instance, the original result of [37] for the case of hyperplanes (where projection (3)–(4) is so easy that it can be accomplished by a closed formula) proves convergence of the whole sequence to the point of A ∩ B closest to the starting point w0. Convergence results (for the convex case) can be obtained for substantially more complex versions of the approach, although many of these results become particularly significant when the number of intersecting sets is (much) larger than two. For instance, the general scheme analyzed in [5] considers the iteration

vi = λi ( − αi )vi−1 + αi (vi−1) + λi ( − αi )vi−1 + αi (vi−1) 0 1 0 0 PB 1 1 1 1 PA

αi ∈[ , ]   λi ≥  where h 0 2 are the (over/under)relaxation parameters and h 0arethe = , ,λi + λi = weights for h 0 1 0 1 1, and

(v)¯ ∈ {v −¯v|v ∈ }, (v)¯ ∈ {¯v − v|v ∈ }. PA argmin A PB argmin B

Thus, SPMs can allow weighted simultaneous projection and relaxation; we mention in passing that these algorithms bear more than a casual resemblance with subgradient αi = methods [18], as discussed in [5, §7]. The scheme (3)–(4) clearly corresponds to h 1 λi = (“unrelaxed”) and (i mod 2) 1 (“cyclic control”), so that only one among the two projections actually need to be computed at any iteration (zi = v2i−1 and wi = v2i ). While simultaneous projection is unlikely to be attractive in the FP setting, relaxation 123 380 C. D’Ambrosio et al. is known to improve the practical performance of SPMs in some cases, and it could be considered. All these global convergence results rely on convexity, but in our case the involved sets are far from being convex; convergence of an SPM applied to the nonconvex MINLP is therefore an issue. In order to consider the nonconvex case, the typical strategy is that of recasting SPMs in terms of block Gauss-Seidel approaches applied to the minimization of a block-structured objective function Q(z,w). These approaches, based on the same idea to iteratively minimize over one block of variables at a time, can be shown to be (locally) convergent under much less stringent conditions than convexity, especially in the two-blocks case of interest here. Different convergence results, under different assumptions, can be found e.g. in [23,34] for the even more general setting where the objective function of is

L(z,w)= h(z) + Q(z,w)+ k(w),

p q with h : R → R ∪{+∞}and k : R → R ∪{+∞}proper lower semicontinuous p+q functions, neither convex nor differentiable, and Q : R → R is regular, i.e.,

Q (z,w; dz, 0) ≥ 0 ∧ Q (z,w,0, dw) ≥ 0 ⇒ Q (z,w; dz, dw) ≥ 0(5)

for all feasible z,w, where Q (z,w; dz, dw) is the directional derivative of Q at (z,w) along the direction (dz, dw). Smooth functions are regular, while in general nonsmooth ones are not. With stricter conditions on Q (e.g. C1) the results can be extended and somewhat strengthened [3]tothestabilized version

i ∈ { ( ) + ( ,wi−1) + λ  − i−12 | ∈ } z argmin h z Q z i z z 2 z A (6) wi ∈ { ( i ,w)+ (w) + μ w − wi−12 | w ∈ }, argmin Q z k i 2 B (7) where the penalty terms are added to discourage large changes in the current z, w iterates. The method (6)–(7) holds under mild assumptions such as upper and lower i i boundedness of λi and μi ; the whole sequence (z ,w ) is shown to converge to a critical point of L (as opposed to only convergence of subsequences like in [23]). One can then fit the FPs for the nonconvex MINLP case in the above setting by = = , ( ,w) = − w2,λ = μ = , choosing e.g. h k 0 Q z z 2 i i 0, and A B in any one of the possible ways discussed above. Note that different variants could be also discussed, such as using non-zero h(z) and k(w) to include a weighted contribution of the objective function to try to improve on the quality of the obtained solutions, a-la [1]. Sticking to the simplest case, one has that the sequence defined by

− − (x¯i , y¯i ) ∈ argmin{(x, y) − (xˆi 1, yˆi 1)|g(x, y) ≤ 0 ∧ (x, y) ∈ X } (8) i i i i p (xˆ , yˆ ) ∈ argmin{(x, y) − (x¯ , y¯ )|gC (x, y) ≤ 0 ∧ (x, y) ∈ X ∧ x ∈ Z }. (9)

(or any other appropriate splitting of the constraints) converges to a local optimum of the (nonconvex) problem (2). Thus, if δi → δ∞ > 0, then either F =∅or the algorithm has converged to a critical point which is not a global minimum. Telling 123 A storm of feasibility pumps for nonconvex MINLP 381 these two cases apart is unfortunately difficult in general; however, because we are proposing a MINLP heuristic, rather than an exact algorithm, we shall typically assume the latter case holds, and we shall therefore employ some Global Optimization (GO) techniques to reach a putative global optimum. It should be remarked that (3)–(4) is only the most straightforward application of SPM in our setting. A number of issues arises:

– The algorithm alternates between solving the nonconvex NLP (8) and the convex MINLP (9). In order to retain the local convergence property, both problems would need to be solved exactly: a difficult task in both cases. – Being (9) a mixed-integer program, it would be very attractive to be able to use the efficient available MILP solvers to tackle it. However, in order to do that one would — as the very first step — need to substitute the Euclidean norms with “linear” ones (L1, L∞). – In the standard FP approach [8,20] the distance is actually only measured on the integer (x) variables, as opposed to the full pair (x, y).

In the rest of the paper, we discuss several modifications to this approach in order to address the above issues and better exploit the structure of the problem at hand.

3 Using different norms

In this section we consider employing two different norms ·A and ·B in the two subproblems (3)–(4):

i i−1 z ∈ argmin{z − w A | z ∈ A } (10) i i w ∈ argmin{z − wB | w ∈ B}. (11)

The Euclidean norm is appropriate in (8) because of its smoothness property and because (8) is already nonlinear. In the case of (9), however, the L1 or L∞ norms yield a convex MINLP whose objective function can be linearized by means of standard techniques [28]. Provided the constraints indexed by C are linear, or they are outer linearized, (9) then becomes a MILP, which allows use of the available efficient off- the-shelf general-purpose solvers. Replacing the norm in (9), however, prevents us from establishing monotonicity of the sequence {δi | i ∈ N}: assuming A = 2 and (say) B =∞, for example, one uses ·A ≥·B to derive

i i i+1 i i+1 i i+1 i+1 ||z − w ||A ≥||z − w ||A ≥||z − w ||B ≥||z − w ||B,

i+1 i+1 i+1 i+1 but nothing ensures ||z −w ||B ≥||z −w ||A. A way to deal with this case is to replace (10)by

i i−1 i−1 i−1 i−1 z ∈ argmin{z − w A | z ∈ A ∧z − w B ≤ βz − w B}, (12) 123 382 C. D’Ambrosio et al. for β ∈ (0, 1]. This implies monotonicity for all β ≤ 1 and strict monotonicity for all β<1:

i i i i i+1 i i+1 i+1 ||z − w ||B ≥ β||z − w ||B ≥||z − w ||B ≥||z − w ||B.

For B =∞, the required reformulation for (10) only amounts to restricting variable bounds; as this restricts the feasible region, it is also expected to facilitate the task of solving (12). Local convergence — that is, no cycling—of the sequence of iterates generated by the FP algorithm given by (12) and (11) can be established by ensuring i i that the sequence z − w B converges. A convergence failure might occur if (12) becomes infeasible because of the restricted variable bounds.1 This shows that

i−1 i−1 i−1 min{||z − w ||B | z ∈ A }≥β||z − w ||B, which in turn implies that, for β ≈ 1, zi−1 is a good candidate for a local minimum, leading to the choice zi = zi−1. If the mixed-integer iterate (11) also cannot improve the objective, then (zi ,wi ) can be assumed to be a local minimum; this case will be dealt with later. We remark that the fact that

∀z ∈ A ||z −¯w||B ≥||¯z −¯w||B (13) ∀w ∈ B ||¯z − w||B ≥||¯z −¯w||B is not sufficient to ensure that (z¯, w)¯ is a local minimum: this is because ·B is not regular in the sense of (5)ifB ∈{1, ∞}. Indeed, it is clear that for Q(z,w)= g(z−w), one has Q (z,w; dz, dw) = g(z − w; dz − dw), which means that Q (z,w; dz, 0) = g(z −w; dz) and Q (z,w; 0, dw) = g(z −w;−dw). Thus, for (z,w)such that ||·||B is not differentiable at z − w, it is not difficult to construct a counterexample to (5). One is shown in Fig. 1 for the case B = 1, where Q (z,w; dz, 0) = Q (z,w; 0, dw) = 0, but Q (z,w; dz, dw) =−1.

Example 1 Based on Fig. 1, we construct an example of nonconvergence of the FP with A = 2 and B = 1. Let A ={(x, y) | x ≥ 3} and B ={(x, y) | 2x ≤ y}, and consider z¯ = (3, 4) and w¯ = (2, 4). It is easy to verify that

z¯ ∈ argmin{(x, y) − (2, 4)2 | (x, y) ∈ A } w¯ ∈ argmin{(3, 4) − (x, y)1 | (x, y) ∈ B}, which implies that (z¯, w)¯ is a fixed point for the sequence generated by the FP. However, (z¯, w)¯ is not a local minimum of (2): by moving a step of length 2 along the feasible direction (dz, dw) = (0, 1, 1/2, 1) we obtain z = (3, 6) and w = (3, 6), and z − w 1 =z − w 2 = 0 < 1 =¯z −¯w1 =¯z −¯w2. 

1 These failures happen in practice as shown in Sect. 5.4.

123 A storm of feasibility pumps for nonconvex MINLP 383

Fig. 1 Non-reguarity of ||·||1

Hence, the modification (12) of the FP still guarantees convergence of the δi sequence, and therefore (at least for β<1) ensures that no cycling can occur. Con- vergence may occur to a local minimum when using “nonsmooth” norms such as L1 and L∞ even if A and B were convex, but this is not a major issue since the sets are nonconvex, and therefore there is no guarantee of convergence to a global minimum anyway. Other mechanisms in the algorithm (cf. §4.2) are designed to take care of this.

3.1 Partial norms

A structural property of the specific nonconvex MINLP setting is that whenever z = (x, y) ∈ A has the property that there exists some z˜ = (x, y˜) ∈ B, then z ∈ F ; in other words, the difficulty of optimizing over B is given by the integer constrained variables x. Thus, for our purposes we can consider

− (x¯i , y¯i ) ∈ argmin{||x −ˆxi 1|| | (x, y) ∈ A } (14) (xˆi , yˆi ) ∈ argmin {||x ¯i − x|| | (x, y) ∈ B}. (15) instead of (10)–(11). This means that we need only to consider the distance between the projection of A and B on the x-subspace, as opposed to the distance between the full (x, y) iterates. This does not have any significant impact on the approach. From the theoretical viewpoint, it just says that the function Q(z,w)in (6)–(7) is constant (hence, smooth) on a subset of the variables, i.e.,

Q((x¯, y¯), (xˆ, yˆ))=||¯x −ˆx||.

Things are, in theory, rather more complex when two different norms ·A and ·B are used in (14)–(15) (respectively), since then there is no longer a well-defined function Q to minimize. However, this is as well the case when using different norms on the whole iterates, as advocated in the previous section. From the practical viewpoint, 123 384 C. D’Ambrosio et al. using different partial norms only amounts to the fact that the norm constraint in (12) actually reads

i−1 i−1 i−1 x −ˆx B ≤ β¯x −ˆx B (16) which of course is still enough to guarantee the (strict, if β<1) monotonicity of the sequence {δi }. This still provides all the local convergence properties that the FP requires.

4 Approximate solution of the subproblems

The convergence theory for SPMs would require solving (8) and (9) to global optimal- ity. As already remarked, this is extremely challenging and not very likely to be effec- tive in the context of what, overall, remains a heuristic method, which at any rate does not provide any theoretical guarantee of success. Furthermore, even if the subproblems were actually solved to global optimality, several variants of the FP approach—most notably, those employing two different norms–would not still entirely fit into the theo- retical framework for which convergence proofs are readily available. This frees us to consider several different options and strategies to solve both (8) and (9), as discussed in this section, which give rise to “a storm” of many different configurations that we extensively tested computationally. The results are reported in Sect. 5, either in detail for the most successful algorithms or in summary for the unsuccessful ones.

4.1 Addressing the nonconvex NLP (8)

As already mentioned solving (8) to global optimality is difficult by itself, mainly because of the nonconvex constraints. Indeed, if only convex constraints were present, every local optimum would be guaranteed to be also a global optimum. For this reason, considering both the convex and nonconvex constraints does not make the subprob- lem much harder in practice than only considering the nonconvex ones. Although applying GO techniques to obtain a provably optimal solution would likely be too time consuming, we still attempt to solve the problem globally by using two different approaches: 1. a simple stochastic multi-start approach [33] in which the NLP solver is provided with different randomly generated starting points in order to try to escape from possible local minima; 2. a Variable Neighborhood Search (VNS) [24] scheme [29,30]. i−1 i−1 In general, finding any feasible solution for (8) such that ||x ¯ −ˆx ||A < ||x ¯ i−1 −ˆx ||B is enough to retain the monotonicity property of the sequence; thus, the solution process to (8) can be terminated as soon as this happens. Failure to obtain this condition may lead to declare the failure of a local phase, without identifying a feasible solution, even if one could be found if a globally optimal (or, at least, better) solution for (8) be determined, as shown in Fig. 2. In particular, we consider a nonconvex MINLP in two variables. On the x-axis we have an integer variable, 123 A storm of feasibility pumps for nonconvex MINLP 385

2 2

1,... 1 1 1 1

• 0 • 0

Fig. 2 Solving (8) heuristically (left) or to global optimality (right): helping to prevent cycling

•¯4 •¯2 •¯3 •ˆ3 •¯2 •ˆ2 •¯1 •ˆ1 •¯1 •ˆ1

• 0 • 0

Fig. 3 Solving (8) without (left) and with (right) the fixing strategy b: accelerating convergence while on the y-axis a continuous one. The gray part is the feasible region of the NLP relaxation of P while the set of the bold lines represents the feasible region of P. The symbols •ˆ represent solutions xˆi where i is the iteration number. Similarly, the symbols •¯ represent solutions x¯i where i is again the iteration number. The figure shows that only requiring local optimality in (8) can lead to cycling: on the left, the a local optimum •ˆ to (8) does not allow the algorithm to proceed to •¯2 and eventually to the feasible MINLP solution •ˆ2 (on the right). However, sometimes both strategies 1 and 2 might fail in finding a feasible solution for (8) (for example due to a time limit, see Sect. 5) and that can happen even if they claim the returned solution is NLP feasible. In such case we experimented two options: a. we define (9) by using any infeasible solution of (8); b. we fix the integer variables x and we solve again (locally) a modified version of problem (8) in which the objective function is replaced by the original objective of (1). The fixing strategy b might improve convergence speed, as shown in Fig. 3: 1 if x1 is fixed to the value given by •ˆ then the next NLP solution is likelier to directly lead to the feasible MINLP region. 123 386 C. D’Ambrosio et al.

Finally, as discussed in Sect. 3, one might implement the overall FP scheme by using the Euclidean norm for (8) and a different norm in (9), like L1 or L∞,to simplify it. As previously discussed, this may well impair the only remaining (weak) convergence property of the approach, i.e., monotonicity of the sequence {δi }, making it harder to declare that a “local optimum” has reached. For this case, two options can be implemented:

i. we forget about such a difference in norms and we hope for the best; ii. we amend (8) by the norm constraint (16), and solve it as usual. We remark here that preliminary computational experiments have shown that the value of β does not strongly influence the results, thus we used β = 1 in the computational results of Sect. 5.

In summary, three main decisions have to be taken to define and solve (8):

I. solution algorithm: multi-start (1. above) versus VNS (2. above), II. additional fixing step: NO (a. above) versus YES (b. above), and III. norm correction: NO (i. above) versus YES (ii. above).

4.2 Addressing the convex MINLP (9)

The first decision that has to be taken for addressing problem (9) concerns the norm to use in the objective function, i.e., how to formulate (9) in practice.

1. Of course, the most trivial option is to keep the Euclidean norm so as (9)isa convex MINLP. 2. As discussed in Sect. 3, the main alternative is to employ either the L1 or the L∞ norm in the objective function so that it can be linearly reformulated in stan- dard ways (via the introduction of a few auxiliary continuous variables). This is in the attempt to replace (9) with a MILP relaxation, because MILP solution technology is currently more advanced than its convex MINLP equivalent. This, however, requires the constraints to be linearized as well. This can be done by means of standard Outer Approximation approaches. That is, assuming C con- tains only nonlinear convex constraints (the linear ones being left in X ), one can approximately solve (9) at the generic iteration i ∈ N by means of its MILP relaxation

i min ¯x − xB (17) k x −¯x ¯ k g(x¯k, y¯k) +∇g(x¯k, y¯k) ≤ 0  ∈ C , k ≤ i (18) y −¯yk  p  (x, y) ∈ X , x ∈ Z (19)

¯ k where the norm B in (17) can be either L1 or L∞ and C ⊆ C is the set of convex nonlinear constraints that are active at (x¯k, y¯k). In other words, one keeps collecting the classical Outer Approximation cuts [19](18) along the iterations and uses them to define a polyhedral outer approximation of I . Note that while 123 A storm of feasibility pumps for nonconvex MINLP 387

•ˆ1,...• 0 •¯1,... • 0 •¯1 •ˆ1

Fig. 4 Solution of (9) without (left) and with (right) OA cuts: helping to prevent cycling

(18) could seem to require that each g for  ∈ C be a differentiable function, this is only assumed for the sake of notational simplicity: notoriously, subgradients of nondifferentiable convex functions can be used as well (e.g. [17]). In both cases, we employ partial norms as detailed in Sect. 3.1 so as to take into account in the objective function only its integral part. The second decision is how to formulate the feasibility space of problem (9), i.e., how to deal with the original set of constraints and relaxing them if needed. This depends on the first decision as well, i.e., on the objective function, either 1. or 2. above, which has been selected. Of course, such a decision is inherently linked to the solution algorithm. a. If the Euclidean norm is used in (9), then we investigate three options: 1. we solve the convex MINLP as is by means of a sophisticated general-purpose MINLP solver, in our case Bonmin solver [10], 2. we solve a convex mixed-integer quadratic problem (MIQP) relaxation of the MINLP. Precisely, the MIQP is obtained by using the objective function2 i min ¯x − x2 instead of (17) but with the same set of (linear) constraints (18)-(19). This is done to simplify the problem and being able to use a sophis- ticated general-purpose MIQP solver, in our case CPLEX [25]. p 3. we remove all constraints (18)–(19), only keeping x ∈ Z and bound con- straints, and solve (9) by rounding. This is in the spirit of both [20] and [9]. b. If instead the L1/L∞ norm is used and the MILP relaxation (17)–(19) is defined, we solve the MILP as is by means of a sophisticated general-purpose MILP solver, in our case CPLEX [25]. The third decision is how to address the issue of cycling. Indeed, because problem (9) only takes into account the subset of convex constraints (or a relaxation of them in the MILP case) the resulting FP algorithm might cycle, i.e., visit the same mixed- integer solution more than once. Note that if (1) is instead a convex MINLP, OA cuts are shown to be sufficient to guarantee that the FP algorithm does not cycle [8]as shown for example in Fig. 4.

2 Note that again the objective function is defined in the MIQP case only on the integer variables. 123 388 C. D’Ambrosio et al.

Fig. 5 The OA cut from γ does not cut off xˆ

•¯1 •ˆ1,... •¯2,...

• 0

Fig. 6 OA cuts may not prevent the FP from cycling

In the nonconvex case, however, OA cuts are not enough, as discussed in Example 2. In addition, in the testbed we used to computationally test our approach, the number of OA cuts we could generate is somehow limited as discussed in detail in Sect. 5.1.

Example 2 In Fig. 5 a nonconvex feasible region and its current linear approximation are depicted. Let us consider x¯ being the current solution of subproblem (8). In this case, only one OA cut can be generated, i.e., the one corresponding to convex constraint γ . However, it does not cut off xˆ, i.e., the solution of (9) at the previous iteration. In this example, the FP would not immediately cycle, because xˆ is not the solution of (9) which is closest to x¯. This shows that there is a distinction between cutting off and cycling. In general, however, failure to cut off previously visited integer solutions might lead to cycling, as shown in Fig. 6. 

One elegant possibility to prevent cycling is that of adding no-good cuts at iteration i to make (xˆk, yˆk) infeasible for all k < i. This is possible if (as it happens in some of the variants) any of the minimum distance problems is solved (even if only approximately) 123 A storm of feasibility pumps for nonconvex MINLP 389 with an exact approach, which not only provides good feasible solutions, but also a lower bound on the optimal value of the problem to provide a guarantee of the accuracy. Indeed, if the solution method proves that the inequality

i ||x −ˆx ||A ≥ ε (20) is satisfied for all (x, y) ∈ A , then one has

A ∩ B = (A ∩{(x, y) : (20)}) ∩ B = A ∩ (B ∩{(x, y) : (20)}).

In other words, the nonlinear and nonconvex “cut” (20) can be added to B without changing the feasible set of the problem. The interesting part is that, of course, xˆi vio- lates (20), and therefore (20) provides—at least in theory–a convenient globalization mechanism. No-good cuts [17] were originally introduced in [4] with the name of “canonical” cuts and recently used within the context of MINLP [17,32]. If x are binary variables and ·is the L1 norm, we can take ε = 1 and reformulate (20) linearly as

x j + (1 − x j ) ≥ 1. ≤ ≤ j p j p xˆ j =0 xˆ j =1

For general integer variables, an exact linear reformulation is given, for example, in [17] and involves adding 2p new continuous variables, p new binary variables and adding 3p + 1 linear equations to the problem. Thus, the size of such a reformulation could rapidly become prohibitive in the context of an iterative method like FP. This is why no-good cuts are used in a limited form in our scheme and we instead implement two alternative, less elegant, approaches: i. We employ a tabu list in order to prevent a MILP solver from finding the same solutions (xˆ, yˆ) at different iterations. ii. We configure our solver to find a pool of solutions from which we choose the best non-forbidden one. Clearly, the issue of preventing the FP scheme to cycle is not confined to the solution of problem (9) but is more a globalization strategy. Indeed, problem (8) could in turn || −ˆi−1||2 ≥ ε be amended by no-good cuts in the form x x 2 which are reverse-convex constraints not different from those already in (8). However, we decided to concentrate our attention to (9) for two reasons. On the one side, this is the way both previous FP algorithms worked, namely the one for MILP,through random flipping of the rounding step, and that for convex MINLP, by means of OA cuts. On the other hand, the value to be assigned to ε would be any lower bound greater than 0 on the optimal solution of (9). However, we never really solve such a problem to optimality and in at least one case, the rounding option 3 above, we do not compute any lower bound either. In summary, three main decisions have to be taken to define and solve (9):

I. the norm to be used in the formulation of (9): L2 (1. above) versus L1/L∞ (2. above), 123 390 C. D’Ambrosio et al.

II. how to define the feasible region of (9) and solve it: MINLP (1 above) versus MIQP (2 above) versus rounding (3 above) or MILP (b above), and III. how to avoid cycling: tabu list (ii. above) versus solution pool (ii. above).

5 Computational results

In this section we discuss the outcome of our extensive computational investigation.

5.1 Computational setting

The algorithms were implemented within the AMPL environment [22]. We chose to use this framework to make it easy to change subsolver. In practice, the user can select the preferred solver to solve NLPs, MINLPs, MIQPs or MILPs, exploiting their advantages. We also use a new solver/reformulator called ROSE (Reformulation Optimization Software Engine, see [28,27]), of which we exploit the following features:

– Model analysis: getting information about nonlinearity and convexity of the con- straints and integrality requirements of the variables, so as to define subproblems (8) and (9). – Solution feasibility analysis: necessary to verify feasibility of the provided solu- tions. – OA cut generation: necessary to update (9). In order to determine whether a con- straint is convex, ROSE performs a recursive analysis of its expression tree [26]to determine whether it is an affine combination of convex functions. We call such a function “evidently convex” [28]. Evident convexity is a stricter notion than convexity: evidently convex functions are convex but the converse may not hold. Thus, it might happen that a convex constraint is labeled nonconvex; the informa- tion provided is in any case safe for our purposes, i.e., we generate OA cuts only from constraints which are certified to be convex. Unfortunately, the number of problems in the testbed (see next section) in which we are able to generate OA cuts is limited, around 15% of them, surely because of such a conservative (but safe) policy adopted by ROSE.

5.2 FP variants and preliminary results

Because of the multiple options which can be put in place to solve both (8) and (9), we had to implement and test more than twenty FP versions/variants to assert the effective- ness of each of the algorithmic decisions discussed in the two previous sections. Some of these options have been ruled out after a preliminary set of experiments involving 243 MINLP instances from MINLPlib [12] and used, among others, in [16,30]. Only 65 among such 243 instances are those in which the open-source Global Optimization solver COUENNE 0.1 [6] (available from COIN-OR [14]) is unable to find a feasible 123 A storm of feasibility pumps for nonconvex MINLP 391 solution within a time limit of 5 min on an Intel Xeon 2.4 GHz with 8 GB RAM running Linux. Thus, the goal of the preliminary set of computational experiments was twofold. On the one side, we wanted to be quick and competitive on the “easy” instances, i.e., the 178 instances on which COUENNE is able to find a solution within 5 min of CPU time. This is because FP can clearly be used as a stand-alone heuristic algorithm for nonconvex MINLP, and must be competitive with a general-purpose solver used as well as a heuristic, i.e., truncated within a short time limit. That was achieved by the “best” FP versions/variants that will be discussed in the remainder of the section. To give an example, the version denoted as FP-1 (see Sect. 5.4) finds a feasible solution for 156 of the 178 “easy” instances within 5 min, encounters numerical troubles in 13 of them (because of the NLP solver) and requires more than 5 min in the remaining 9 instances. Because COUENNE 0.1 (like most GO solvers) mainly implemented simple heuristics based on reformulations and linearizations, it would have been relatively easy to recover those few instances with longer computing times (9) by ad-hoc policies. On the other hand, however, we wanted to be effective (in possibly longer computing times) on the 65 “hard” instances where simple heuristics and partial enumeration failed. In particular, FP should be effective on the instances in which the nonlinear aspects of the problems play a crucial role, thus suggesting its fruitful integration within COUENNE or any other GO solver (as happened for FP algorithms in MILP). Indeed, the current trunk version of COUENNE is more sophisticated in terms of heuristics also due to our investigation preliminary reported in [15,16] and some results at the of Sect. 5.4 seem promising in this concern. The FP variants which did not “survive” the preliminary tests3 are those that, at the same time, did not perform particularly well in the “easy”instances and did not add anything special on the “hard” ones. Namely,

1. Solving (8) by VNS was always inferior with respect to solve it by the stochastic multi-start approach. Such a poor performance of the VNS approach might be due to its iterative implementation within AMPL: at each iteration, a different search space is defined, starting from a small one and incrementing it so that at the last iteration the entire feasible region is considered. In particular, this approach seems to be too “conservative” with respect to the previous solution. 2. The additional fixing step which can be performed in case of fail when solving (8) by fixing the integer variables has a slight positive effect when the norm constraint is added while turns out to be crucial in case it is not. In a sense the theoretical convergence guaranteed by the use of norm constraints seems to make problems (8) easier, thus the benefit of the fixing step is particularly high if such constraints are not added. We then decided to always include the fixing step as well. 3. In case the Euclidean norm is kept in problem (9), we decided to solve the convex MIQP instead of the convex MINLP. The main reason (besides some technical issues related to modify a convex MINLP solver like Bonmin to implement mech- anisms to prevent cycling) is that the number of evidently convex constraints as

3 No detailed computational results are reported here for the preliminary computational investigation, some of them are discussed in detail in [16]. 123 392 C. D’Ambrosio et al.

discovered by ROSE is very limited in the testbed. Thus, if the constraints in (9) are linear, then the MIQP solver of CPLEX is clearly more efficient than a fully general convex MINLP solver line Bonmin. 4. Preventing cycling by using a pool of solutions was always inferior with respect to use the tabu list. Again, this might be due to the lack of flexibility of the (nice) solution pool feature of CPLEX 11 that we used in our experiments. Every time we need to solve (9), we ask CPLEX to produce a number of solutions equal to the number of tabu list solutions plus one. Once obtained the solutions pool, we analyze the solutions starting from the first and set (xˆi , yˆi ) as the first solution of the pool which is not present in the tabu list. However, we have to consider the two following drawbacks: (i) the solution pool is populated after the branch and bound is finished. Because we have a time limit for solving (9), it is not guaranteed that we would have a number of solutions sufficient to provide a non-forbidden solu- tion (especially because providing a solution pool is a time-consuming feature); (ii) we cannot force CPLEX to measure the diversity of the solutions in the pool by neglecting the continuous part of the problem. Unfortunately, CPLEX can provide us a set of solutions which has the same integer values, but different continuous val- ues. More generally, it might happen that only forbidden solutions are generated, for example if the continuous relaxation of (9) is integer feasible but forbidden. In this case the solution would be discarded, but no further solution can be generated.

Due to the above discussion, the only surviving subproblems to be solved are nonconvex NLPs and convex MILPs and MIQPs. The NLP solver used is IPOPT 3.5 trunk [13], while the MILP and MIQP solvers are those of CPLEX 11 [25]. Before ending the section we need to specify two more implementation details for the surviving FP variants. Implementing a tabu list in CPLEX. Discarding a solution in the tabu list within the CPLEX branch and bound is possible using the incumbent callback function. The tabu list is stored in a text file which is then exchanged between AMPL and CPLEX. Every time CPLEX finds an integer feasible solution, a specialized incumbent callback function checks whether the new solution appears in the tabu list. If this is the case, the solution is rejected, otherwise the solution is accepted. CPLEX continues executing until either the optimal solution (excluding those forbidden) is found or a time limit is reached. In the case where an integer solution found by CPLEX at the root node appears in the tabu list, CPLEX stops and no new integer feasible solution is provided to FP4. In such a case, we amend problem (9) with a no-good cut [17] which excludes the solution and we call CPLEX again. Avoid cycling when solving (9) by rounding. When the MILP relaxation of (9)is solved by rounding to the nearest integer the fractional values of vector x¯, the methods for preventing the cycling cannot be implemented in the way we described above. The method adopted is taken from the original FP paper [20]: whenever a forbidden solution is found, the algorithm randomly flip some of the integer values so as to obtain a new solution.

4 Note that this is the same issue discussed in the solution pool case. 123 A storm of feasibility pumps for nonconvex MINLP 393

Table 1 FP variants

Variant Problem (8) Problem (9)

Algorithm Fixing step Norm constraint Norm Algorithm Cycling

FP-1 multi-start YES YES L1 MILP tabu list FP-2 multi-start YES NO L1 MILP tabu list FP-3 multi-start YES YES L2 Rounding tabu list FP-4 multi-start YES N/A L2 MIQP tabu list FP-5 multi-start YES YES L∞ MILP tabu list FP-6 multi-start YES NO L∞ MILP tabu list

Table 2 Comparing the six FP variants, aggregated results

FP-1 FP-2 FP-3 FP-4 FP-5 FP-6

Successes494522234446 Successes alone 0 3 0 0 1 1 Time limit reached 11 14 42 32 2 12 Fails 5 6 1 9 19 7 Wins 26 20 10 4 10 8 Time geomean 151.02 104.45 17.59 76.14 23.25 14.99

5.3 Code tuning

The algorithm terminates after the first MINLP feasible solution is found or a time limit is reached. The parameters are set in the following way: time limit of 2 h of user CPU time, the absolute feasibility tolerance to evaluate constraints is 1e-6, and the relative feasibility tolerance is 1e-3 (used if absolute feasibility test fails). The tabu list length was set adaptively to a value which was inversely proportional to the number of integer variables of the instance, i.e., the number of values to be stored for each solution of the tabu list. The value was 60,000 divided by the number of integer variables. The actual mean value, over the full set of 243 instances, of the solutions stored in the tabu list was 35.

5.4 Results

The six surviving FP variants have been extensively tested on the full set of 243 MINLP instances and, in particular, we discuss the results on the 65 “hard” instances introduced in Sect. 5.2. More precisely, the six variants have the characteristics reported in Table 1. Table 2 reports the aggregated results on the 65 “hard” instances. In particular, we consider for each FP variant the number of times the algorithm terminated with a feasible solution within the 2-h of user CPU time limit (successes), the number of times the algorithm was the only one to find a feasible solution (successes alone), the number of times the time limit was reached without a feasible solution (time limit 123 394 C. D’Ambrosio et al. reached), the number of times the algorithm encountered numerical issues (fails), the number of times the algorithm found the best—smallest—solution (wins) and the geometric mean of the computing time for the solved instances (time geomean). The detailed results are reported in Tables 3 and 4 where, for each variant, we give the solution value (value), the computing time (time) and the number of iterations (it.s) which are roughly equal to the number of problems (8) and (9) solved. In case of numerical issues for a pair instance / FP variant, we report in all entries for such an instance some “++”, whereas in case of time limit reached the entry value is set to “-” (while we correctly report the computing time of 7,200 CPU seconds and the number of iterations within such a time). The results of Tables 2, 3 and 4 show that FP-1 is the most successful FP variant and is remarkably able to find a feasible solution in limited CPU time on 75% of the “hard” instances in the testbed. A direct comparison with the closest variant, namely FP-2, shows that the use of the norm constraint is useful: although FP-1 does not dominate FP-2, it is overall superior on all entries and there are many instances in which FP-2 converges slowly whereas FP-1 reaches feasibility in a very small number of iterations. Variant FP-3 is very fast but seems to be a bit “unsophisticated” for those instances which look more difficult (in the “hard” testbed). However, it might be a viable option for a “cheap” FP variant executed extensively within a GO solver. Variant FP-4 does not look—at the moment—very competitive, although it is not fully dominated because it finds the smallest solution four times, in one case (deb8) a much smaller one, with respect to the other variants. One relevant issue for FP-4 seems that the MIQP solved as problem (9) is time consuming thus allowing only a limited number of FP iterations. Things might change in the future, depending on the solver or its settings. Finally, variants FP-5 and FP-6 are very close to FP-1 and FP-2, respectively, and they indeed lead to similar results. Specifically, FP-5, compared to FP-1, seems to have much more numerical troubles (due to the NLP solve, see Sect. 3) and is inferior in terms of quality of the solutions obtained (wins) but is much faster. Instead, variant FP-6 is almost equivalent, perhaps superior, to FP-2, the two main differences being the number of wins (20 for FP-2 with respect to 8 for FP-6) and the speed (104.45 CPU seconds for FP-2 with respect to 14.99 for FP-6). Overall, both FP-5 and FP-6 seem promising for further investigation. Concerning the interaction of FP-1 with the GO solver COUENNE (or any other), note that in 14 cases FP-1 finds a feasible solution within 1 minute of CPU time (in 24 cases within 5 min), thus suggesting a profitable integration within the solver.

6 Conclusion

We have presented the theoretical foundation of an abstract Feasibility Pump scheme interpreted as a Successive Projection Method in which, roughly speaking, the set of constraints of the original problem is split (possibly in different ways) in two sets and the overall algorithm aims at deciding if the feasibility space given by the intersection of such two sets is empty. Such a scheme has been specialized for dealing with nonconvex Mixed-Integer Nonlinear Programming problems, the hardest class of (deterministic) optimization problems. 123 A storm of feasibility pumps for nonconvex MINLP 395 221.92 618 2 112,174.73 624 2 − − 7,200 28 ++ ++ ++ 201.16 2 2 − − 7,200 6 – 7,200 3,210 – 7,200 2 7,200 6 – 7,200 3,211 – 7,200 2 201.29 614 2 120,066.02 241 2 – 7,200 41 ++ ++ ++ 102,867.73 4 2 ++ ++ ++ − − − − − 201.15 615 2 120,042.73 138 2 102,002.02 5 2 − − − Value Time it.s Value Time it.s Value Time it.s Value Time it.s Comparing FP variants FP-1, FP-2, FP-3 and FP-4, detailed results lop97ic – 7,200 9 – 7,200 120 – 7,200 94 – 7,200 13 fo8fo9_ar2_1fo9_ar25_1 1,136,279.49fo9_ar4_1 1,136,997.73 894,678.42fo9_ar5_1 9,959.68 1,286fo9 1,428,148.20 635johnall 167 9 97 – 17 202 1,006,964.21 8 – 68 2 1,400,000.00 61 1,599,990.28 1,599,993.97 1,543 7,200 32 4,212 533 8 14 7,200 1,600,000.00 699 – 14 – – 2 221 – – 153 – 7,200 7,200 7,200 3,110 7,200 2,619 2,616 1,400,000.00 7,200 2,620 – – 860 2,610 – 7,200 – 444 2,552 1,600,000.00 7,200 7,200 1,387 7,200 2 2 657 7,200 2 2 fo8_ar3_1 994,235.33 784 367 deb10deb6deb7 ++deb8 234.78deb9 411.00detf1 8,453,005,065.59eg_all_s 444.67eg_disc2_s 23 ++ 11,497.56 197eg_disc_s 223.14 65,822.96 139eg_int_s 2 ++ 29 94,165.42fo8_ar25_1 4 33 ++ 368 94,167.12 185,839,836.37 237.11 994,207.06 7 27 2 345.76 2 8 2 1 3 185 10 15,976.03 4 1 425.34 1 ++ 100,004.34 124 100,003.77 10 1 8,453,005,005.71 4 100,003.69 131 ++ 30 3 5 100,005.46 52 16 290.90 223.29 2 7 1 419.78 5 4 1 7 8,455.75 1 416,332.32 94,165.69 444.67 65,822.96 1 7 25 94,165.42 218 3 961 94,167.12 14 2 10 7 8 1 39 1 8 ++ 1 235.81 1 10 451.05 1 15,976.03 1 ++ 1 100,004.34 9 731 438.39 13 ++ 100,003.69 5 2 100,005.46 4 2 ++ 7 59 ++ 7 1 2 ++ 1 1 csched2 Table 3 Instance FP-1beuster ++ ++ FP-2 ++ ++ ++ ++ FP-3 FP-4 csched2a 123 396 C. D’Ambrosio et al. 1.02 613 2 1.09 1,959 81 1.09 1,922 81 − − − 1.01 496 35 – 7,200 71 1.01 1,107 181 0.99 1,702 149 – 7,200 51 517.80 0 2 – 7,200 2 − − − − 1.051.111.13 7071.01 6,266 8 4,980 67 27 – – 244 – 2 7,200 7,200 7,200 33 15 8 – – – 7,200 7,200 19 12 7,200 34 1.12 1,649 51 – 7,200 101 – 7,200 14 1.02 710 3 0.99 110 4 536.20 1 4 1.111.09 2,730 1,584 173 59 – – 7,200 38 7,200 35 – 7,200 14 1.12 847 15 – 7,200 107 – 7,200 14 1.111.10 2,641 686 173 – 7 – 7,200 38 7,200 – 34 7,200 14 − − − − − − − − − − − − − 1.061.141.01 4,367 1,165 4 133 2 2 – 7,200 13 – 7,200 5 – 7,200 11 1.060.99 6,501 1,666 8 3 – 7,200 372 – 7,200 39 – 7,200 14 1.12 1,655 3 1.03 614 2 1.00 2,064 6 342.20 1 4 1.05 2,626 4 1.09 1,602 3 1.12 1,645 3 1.091.10 1,602 646 3 2 − − − − − − − − − − − − − − Value Time it.s Value Time it.s Value Time it.s Value Time it.s continued nuclearva nuclear49anuclear49b –nuclear49 7,200 8 nuclear25b nuclear25a nuclear24 nuclearvb nuclearvc nvs08nvs20nvs24 24,116.94 146,475,177.22 0 0 1 1 138,691,481.67 24,119.23 0 1 1 1 146,475,177.22 0 24,116.94 1 0 138,691,481.67 1 0 24,119.23 1 0 1 o8_ar4_1o9_ar4_1 5,822,973.45qapw 6,877,522.82 22 198 468,078.00 10 59 8,199,969.73 372 8,199,964.62 2 736 6,206 460,118.00 236 698 – – 637 2 – 7,200 7,200 3,214 2,611 – – 7,200 21 7,200 7,200 464,259.68 2 2 684 2 nuclear24b nuclear24a nuclear14 Table 3 Instance FP-1mbtdnuclear104nuclear10a – 91.33nuclear10b –nuclear14a –nuclear14b 4,266 7,200 2 FP-2 7,200 2 7,200 2 98.53 ++ 2 – – 4,045 2 ++ 7,200 – ++ FP-3 7,200 5 – 4 – – 7,200 3 7,200 7,200 2 1,000,005.67 7,200 3 FP-4 2 ++ 5,834 – 2 – ++ 7,200 ++ 7,200 4 3 123 A storm of feasibility pumps for nonconvex MINLP 397 Value Time it.s Value Time it.s Value Time it.s Value Time it.s continued Table 3 Instance FP-1saa_2space25aspace25 11,497.56 661.97space960super1 661.97 24,070,000.00super2 377super3 376 3,629 ++super3t ++ 2 413tln12 7 ++ 3 FP-2tls12 – 15,976.03 3tls2 – 650.69tls5 – ++ 252tls6 650.69 – ++tls7 ++ 5.30 245 ++ 2uselinear – 7,200 7,200 ++ 773var_con10 ++ – ++ 18 8,455.75var_con5 7,200 1,951.37 18 7 ++ – 475.36 18waste 7,200 ++ – 720 14 FP-3 – 978 397.21 – – 10 ++ 188 7,200 – 6 62,025.78 ++ 2 449 7,200 – 24 ++ 1 7,200 ++ 7,200 110 5.30 7,200 9 7,200 54 ++ 7,200 50 22.50 15,976.03 124 9 – 227,751.06 ++ 7,200 16 19 8 463.17 – – 45 7,200 721 47 1,124.32 1 – 37.80 1 10 315.16 – 58 10 – 1,124.38 FP-4 12 2 7,200 – 306,239.04 612 1 5 7,200 – 6 2,892 21 7,200 18 19 619 7,200 5 7,200 2 1,951.37 18 38 26 – – 18 7,200 7,200 3 20 2 – 1 562.62 187 – – 7,200 – 8 2,403 434.66 – 62,025.78 ++ 1 445 29 7,200 7,200 – 50 7,200 21 7,200 6,569 2,783 7,200 – 4 7,200 1,951.37 2 1,464 ++ ++ – 2,226 7,200 1 3 2 7,200 48 – – 2 ++ ++ 7,200 2 306,232.46 ++ ++ 1 7,200 9 70 ++ 7,200 100 7,200 ++ 2 ++ 8 17 ++ ++ 123 398 C. D’Ambrosio et al. 120,066.02 2 2 102,932.87 0 2 − − 120,064.55 2 2 102,932.87 1 2 − − 120,066.02 241 2 102,867.73 4 2 − − 120,042.73 138 2 102,002.02 5 2 − Value Time it.s Value Time it.s Value Time it.s Value Time it.s − Comparing FP variants FP-1, FP-2, FP-5 and FP-6, detailed results deb10deb6deb7 ++deb8 234.78deb9 411.00detf1 8,453,005,065.59eg_all_s 444.67eg_disc2_s 23 ++ 11,497.56 197eg_disc_s 223.14 65,822.96 139eg_int_s 2 ++ 29 94,165.42fo8_ar25_1 4 33fo8_ar3_1 94,167.12 368 ++ 185,839,836.37 237.11 994,207.06 7fo8 27 2 345.76 994,235.33 2 8 2fo9_ar2_1 1fo9_ar25_1 3 185 10 1,136,279.49 15,976.03 4 1 425.34fo9_ar4_1 1 1,136,997.73 894,678.42 784 ++ 100,004.34 124 100,003.77fo9_ar5_1 10 1 9,959.68 185,839,836.37 1,286fo9 – 4 100,003.69 367 131 1,428,148.20 ++ 635 3johnall 3 100,005.46 5 167 9 52 16 – 950.00lop97ic 331.39 2 97 – 7 17 447.77mbtd 202 1 1,006,964.21 5 4 –201.15 7 1 8 – 15,976.27 – 68 1 2 7,200 185,839,836.37 15.45 443.97 100,004.34 1 1,400,000.00 1 61 1 91.33 6 7,200 1,599,990.28 2 9 113 100,003.69 1,599,993.97 615 1,543 7,200 6 100,005.46 32 8 2 2 1,399,992.47 4,212 2 533 8 1 14 8 2 2 7,200 406 11 1,600,000.00 1,333,314.45 699 7,200 949.94 ++ 1,400,000.00 18 14 12 1 4,266 15,976.03 1,599,990.04 –201.29 449.38 55,204.13 4 2 9 1 2 221 10 2 1,599,989.57 3 100,004.34 16 1 1,245 100,003.00 1,598,839.97 – 153 110 100,003.69 434.33 2 98.53 9 1,267 527 58 1,399,991.20 614 100,005.46 6 7 1,600,000.00 ++ 5 3 227 2 85 1,599,990.04 264 1,399,992.80 8 240 2 2 1,599,989.86 1,400,000.00 13 ++ 8 7 1,599,988.83 2 1 2 1,896 7 4,045 2 222 –201.15 7,200 1 20 11 16 642 2 343 1 1,599,992.22 2 120 4 1,600,000.00 10 29 232 – 89.53 1 3 47 2 2 76 5,253 –201.15 7,501 2 20 89.53 5,401.15 1 2 3,652 6,103 97 2 csched2 Table 4 Instance FP-1beustercsched2a ++ ++ FP-2 ++ ++ ++ ++ FP-5 ++ ++ ++ FP-6 ++ ++ ++ 123 A storm of feasibility pumps for nonconvex MINLP 399 1.02 2 2 1.07 303 13 413.80 0 3 1.01 5 4 1.05 214 16 1.07 45 3 0.99 3 3 1.10 271 19 1.07 25 3 1.10 142 19 − − − − − − − − − − 1.03 454 2 – 7,288 47 1.02 2 2 1.14 439 2 – 7,492 30 1.06 18 2 1.12 13 2 – 7,200 216 506.60 360 4 1.00 28 2 1.12 11 2 – 7,202 251 0.99 2 2 1.00 30 2 1.10 3,794 2 – 8,769 4 − − − − − − − − − − − 0.99 110 4 1.111.13 6,266 4,980 67 27 ++ ++ ++ – 7,334 18 1.02 710 3 1.01 244 2 ++ ++ ++ 1.05 707 8 1.12 1,649 51 536.20 1 4 1.09 1,584 59 1.11 2,730 173 ++ ++ ++ 1.12 847 15 1.10 686 7 1.11 2,641 173 ++ ++ ++ − − − − − − − − − − − − − 1.00 2,064 6 1.14 1,165 2 – 7,200 13 1.06 4,367 4 1.03 614 2 1.01 133 2 0.99 1,666 3 1.06 6,501 8 – 7,200 372 ++ ++ ++ 1.12 1,655 3 342.20 1 4 1.05 2,626 4 1.09 1,602 3 1.12 1,645 3 1.10 646 2 1.09 1,602 3 − − − − − − − − − − − − Value Time it.s Value Time it.s Value Time it.s Value Time it.s − − continued nuclearvc nuclear49 nuclear49b nuclear49a – 7,200 8 o8_ar4_1o9_ar4_1 5,822,973.45qapw 6,877,522.82 22 198 468,078.00 10 59 372 8,199,969.73 8,199,964.62 2 736 6,206 460,118.00 236 698 8,171,278.66 43,754.16 637 947 2 1,800 305 285 459,102.00 8,199,970.33 8,199,964.17 85 611 20 2 108 10 459,102.00 610 2 nuclearvb nuclearva nuclear25b nuclear25a nuclear24 nvs08nvs20nvs24 24,116.94 146,475,177.22 0 0 1 1 138,691,481.67 24,119.23 0 1 1 1 138,691,481.67 0 24,119.23 1 0 138,691,481.67 0 1 24,119.23 1 0 1 nuclear24b nuclear24a nuclear14 Table 4 Instance FP-1nuclear104nuclear10a –nuclear10b – – 7,200 7,200 2 7,200 FP-2 2 ++ 2 – – ++ 7,200 ++ 7,200 5 FP-5 ++ 4 ++ ++ ++ ++ ++ ++ FP-6 – ++ 9,078 ++ 4 nuclear14b nuclear14a 123 400 C. D’Ambrosio et al. Value Time it.s Value Time it.s Value Time it.s Value Time it.s continued Table 4 Instance FP-1saa_2space25aspace25 11,497.56 661.97space960super1 661.97 24,070,000.00super2 377super3 376 3,629 ++super3t 2 ++ 413 7tln12 3 ++ FP-2tls12 – 15,976.03 3 –tls2 650.69 – ++tls5 252 650.69 ++ –tls6 245 ++ 2 ++tls7 5.30 7,200 ++uselinear 773 7,200 – ++ 18 15,976.27 ++var_con10 7 – ++ 18 7,200 1,951.37 18var_con5 – ++ 475.36 ++ 7,200 FP-5waste – 14 24,070,000.00 720 117 ++ 397.21 ++ 10 1,513 – 188 7,200 ++ 6 2 62,025.78 – 449 7 7,200 ++ ++ 24 ++ 1 5.30 7,200 7,200 110 ++ 9 ++ 54 15,976.03 ++ 22.50 ++ 50 9 19 24,070,000.00 227,751.06 7,200 ++ ++ 16 463.17 – 7,200 ++ 2,558 47 ++ 10 ++ 1 37.80 116 1 315.16 58 671.09 10 9 – 12 ++ 1 671.09 2 306,239.04 FP-6 ++ 5 ++ 21 6 2,892 7,200 19 227,751.06 5 ++ ++ 5 ++ 38 ++ ++ 26 ++ 17 3 589.80 1 ++ ++ 58 ++ ++ 7,215 ++ 5 ++ ++ 532.55 4 1,114 306,239.04 ++ 1 ++ ++ ++ ++ – 4 23 ++ ++ ++ 3 227,751.06 ++ ++ – ++ 2 1 ++ ++ ++ ++ ++ 96 15.30 – 2 ++ 7,203 589.43 – 306,239.04 ++ – ++ 1,034 1 532.50 7,280 3 38 92 4 7,201 1 5 11 504 8,171 7,211 2 19 497 2 123 A storm of feasibility pumps for nonconvex MINLP 401

Because the devil is in the details, we analyzed a large number of options for (i) formulating and solving the two distinct problems originated by the above split and (ii) guaranteeing convergence of the global algorithm. The result has been more than twenty FP variants which have been computationally tested on a large number of MINLP instances from the literature to assert the viability of FP both as a stand-alone approximation algorithm and as a primal heuristic within a global optimization solver. Six especially interesting of these variants have been discussed in detail and extensive results have been presented on a set of 65 “hard” instances. The results show that feasibility pumps are indeed successful in finding feasible solutions for nonconvex MINLPs.

Acknowledgments One of the authors (LL) is grateful for the following financial support: ANR 07-JCJC-0151 “ARS”and 08-SEGI-023 “AsOpt”; Digiteo Emergence “ARM”, Emergence “PASO”, Chair “RMNCCO”; Microsoft-CNRS Chair “OSD”. The remaining three authors are grateful to the other author (LL) for not making them feel too “poor”. We thank two anonymous referees for a careful reading and useful comments. Part of this work was conducted while the first author was a post-doctoral student at the ISyE, University of Wisconsin-Madison and DEIS, University of Bologna. The support of both institutions is strongly acknowledged.

References

1. Achterberg, T., Berthold, T.: Improving the feasibility pump. Discrete Opt. 4, 77–86 (2007) 2. Al-Khayyal, F.A., Sherali, H.D.: On finitely terminating branch-and-bound algorithms for some global optimization problems. SIAM J. Opt. 10, 1049–1057 (2000) 3. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Lojasiewicz Inequality. Math. Oper. Res. 35, 438–457 (2010) 4. Balas, E., Jeroslow, R.: Canonical cuts on the unit hypercube. SIAM J. Appl. Math. 23, 61–69 (1972) 5. Bauschke, H.H., Borwein, J.M.: On projection algorithms for solving convex feasibility problems. SIAM Rev. 38, 367–426 (1996) 6. Belotti, P., Lee, J., Liberti, L., Margot, F., Wächter, A.: Branching and bound tightening techniques for non-convex MINLP. Opt. Methods Softw. 24, 597–634 (2009) 7. Berthold, T., Gleixner, A.: Undercover primal MINLP heuristic. In: Bonami, P., Liberti, L., Miller, A., Sartenaer, A. (eds.) Proceedings of the European Workshop on Mixed-Integer Nonlinear Programming, pp. 103–113. Université de la Méditerranée, Marseille (2010) 8. Bonami, P., Cornuéjols, G., Lodi, A., Margot, F.: A feasibility pump for mixed integer nonlinear programs. Math. Prog. 119, 331–352 (2009) 9. Bonami, P., Gonçalves, J.: Primal heuristics for mixed integer nonlinear programs. Technical report, IBM (2008) 10. Bonami, P., Lee, J.: BONMIN user’s manual. IBM Corporation, Technical report (June 2007) 11. Bruglieri, M., Liberti, L.: Optimal running and planning of a biomass-based energy production process. Energy Policy 36, 2430–2438 (2008) 12. Bussieck, M.R., Drud, A.S., Meeraus, A.: MINLPLib—a collection of test models for mixed-integer nonlinear programming. INFORMS J. Comput. 15, 114–119 (2003) 13. COIN-OR. Introduction to IPOPT: a tutorial for downloading, installing, and using IPOPT (2006) 14. COUENNE. https://projects.coin-or.org/Couenne,v.0.1 15. D’Ambrosio, C.: Application-oriented Mixed Integer Non-Linear Programming. PhD thesis, DEIS, Università di Bologna (2009) 16. D’Ambrosio, C., Frangioni, A., Liberti, L., Lodi, A.: Experiments with a feasibility pump approach for nonconvex MINLPs. In: Festa, P. (ed.) Symposium on Experimental Algorithms, volume 6049 of LNCS, pp. 350–360. Springer, Heidelberg (2010) 17. D’Ambrosio, C., Frangioni, A., Liberti, L., Lodi, A.: On interval-subgradient cuts and no-good cuts. Oper. Res. Lett. 38, 341–345 (2010) 123 402 C. D’Ambrosio et al.

18. d’Antonio, G., Frangioni, A.: Convergence analysis of deflected conditional approximate subgradient methods. SIAM J. Opt. 20, 357–386 (2009) 19. Duran, M., Grossmann, I.: An outer-approximation algorithm for a class of mixed-integer nonlinear programs. Math. Prog. 36, 307–339 (1986) 20. Fischetti, M., Glover, F., Lodi, A.: The feasibility pump. Math. Prog. 104, 91–104 (2005) 21. Fletcher, R., Leyffer, S.: Solving mixed integer nonlinear programs by outer approximation. Math. Prog. 66, 327–349 (1994) 22. Fourer, R., Gay, D., Kernighan, B.: AMPL: A Modeling Language for Mathematical Programming, 2nd edn. Duxbury Press, Brooks, Cole Publishing Co., Florence, KY (2003) 23. Grippo, L., Sciandrone, M.: On the convergence of the block nonlinear Gauss-Seidel method under convex constraints. Oper. Res. Lett. 26, 127–136 (2000) 24. Hansen, P., Mladenovi´c, N.: Variable neighbourhood search: principles and applications. Eur. J. Oper. Res. 130, 449–467 (2001) 25. ILOG. ILOG CPLEX 11.0 User’s Manual. ILOG S.A., Gentilly, France (2008) 26. Liberti, L.: Writing global optimization software. In: Liberti, L., Maculan, N. (eds.) Global Optimiza- tion: From Theory to Implementation, pp. 211–262. Springer, Berlin (2006) 27. Liberti, L., Cafieri, S., Savourey, D.: Reformulation Optimization Software Engine. In: Fukuda, K., van der Hoeven, J., Joswig, M., Takayama, N. (eds.) Mathematical Software, vol. 6327, pp. 303–314. LNCS Springer, New York (2010) 28. Liberti, L., Cafieri, S., Tarissan, F.: Reformulations in mathematical programming: A computational approach. In: Abraham, A., Hassanien, A.-E., Siarry, P., Engelbrecht, A. (eds.) Foundations of Com- putational Intelligence, vol. 3, pp. 153–234. Studies in Computational Intelligence, Springer, Berlin (2009) 29. Liberti, L., Dražic, M.: Variableneighbourhood search for the global optimization of constrained NLPs. In: Proceedings of GO Workshop, Almeria, Spain (2005) 30. Liberti, L., Mladenovi´c, N., Nannicini, G.: A good recipe for solving MINLPs. In: Maniezzo, V., Stützle, T., Voß, S. (eds.) Hybridizing metaheuristics and mathematical programming, vol. 10, pp. 231–244. Annals of Information Systems, Springer, New York (2009) 31. Nannicini, G., Belotti, P.: Local branching for MINLPs. Technical report, CMU (2009) 32. Nannicini, G., Belotti, P.: Rounding-based heuristics for nonconvex MINLPs. Technical report, CMU (2009) 33. Schoen, F.: Two-phase methods for global optimization. In: Pardalos, P.M., Romeijn, H.E. (eds.) Handbook of Global Optimization, vol. 2, pp. 151–177. Kluwer, Dordrecht (2002) 34. Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Opt. Theory Appl. 109, 475–494 (2001) 35. Tuy, H.: Convex Analysis and Global Optimization. Kluwer, Dordrecht (1998) 36. Vavasis, S.A.: Nonlinear Optimization: Complexity Issues. Oxford University Press, Oxford (1991) 37. von Neumann, J.: Functional Operators, Vol. II. Princeton University Press, Princeton (1950)

123 European Journal of Operational Research xxx (2014) xxx–xxx

Contents lists available at ScienceDirect

European Journal of Operational Research

journal homepage: www.elsevier.com/locate/ejor

Continuous Optimization Optimistic MILP modeling of non-linear optimization problems ⇑ Riccardo Rovatti a, Claudia D’Ambrosio b, Andrea Lodi c, , Silvano Martello a a DEI, University of Bologna, Italy b CNRS LIX, École Polytechnique, France c DEI, University of Bologna, Viale Risorgimento 2, 40136 Bologna, Italy article info abstract

Article history: We present a new piecewise linear approximation of non-linear optimization problems. It can be seen as Received 23 October 2013 a variant of classical triangulations that leaves more degrees of freedom to define any point as a convex Accepted 13 March 2014 combination of the samples. For example, in the traditional Union Jack approach a (two-dimensional) Available online xxxx variable domain is split by a rectangular grid, and one has to select the diagonals that induce the triangles used for the approximation. For a hyper-rectangular domain U 2 RL, partitioned into hyper-rectangular Keywords: subdomains through a grid defined by nl points on the l-axis (l ¼ 1; ...; L), the number of potential sim- Nonlinear programming Q plexes is L! L ðn 1Þ, and an MILP model incorporating it without complicated encoding strategies OR in Energy l¼1 l must have the same number of additional binary variables. In the proposed approach the choice of the Piecewise linear approximation simplexes is optimistically guided by one between two approximating objective functions (one convex, one concave),P and the number of additional binary variables needed by a straightforward implementation L drops to only l¼1ðnl 1Þ. The method generalizes to the splitting of U into L-dimensional bounded poly- topes in RL in which samples can be taken not only at the vertices of the polytopes but also inside them thus allowing, for example, off-grid oversampling of interesting regions. When addressing polytopes that are regularly spaced hyper-rectangles, the methods allows modeling of the domain partition with a log- arithmic number of constraints and binary variables. The simultaneous use of both convex and concave piecewise linear approximations reminds of global optimization techniques, which are, on the one side, stronger because they lead to convex relaxations and not only approximations of the problem at hand, but, on the other hand, significantly more arduous from a computational standpoint. We show theoretical properties of the approximating functions, and provide computational evidence of the impact of their use within MILP models approximating non-linear problems. Ó 2014 Elsevier B.V. All rights reserved.

1. Introduction along with the partition of its domain into 4 regions (segments

r1; ...; r4). Fig. 1(b) gives the resulting convex combinations. The linearization of non-linear functions plays an important For functions f ðx1; x2Þ of two variables, various methods provide role in Mixed Integer Linear Programming (MILP) approaches to piecewise linear approximations. The most common approaches the solution of non-linear optimization problems, especially in are based on triangulations, pictorially illustrated in Fig. 2. The var- the real-world context (see, e.g., (Borghetti, D’Ambrosio, Lodi, & iable domain is split by a grid, and each cell of the grid is subdi- Martello, 2008; Irion, Lu, Al-Khayyal, & Tsao, 2012; Silva & Campo- vided into two triangles: the convex combinations of the nogara, 2014) just to mention a few recent ones). Classical ap- function values sampled at the vertices of a triangle uniquely pro- proaches are based on piecewise linear approximation. vide the linear approximation. Fig. 2 reports the domain partition For functions f ðx1Þ of a single variable, this is obtained by split- and the sampling points both for the original function (Fig. 2(a)) ting the variable domain into intervals, and by sampling the func- and for the corresponding piecewise-linear approximation tion at their extremes: the function is then approximated by the (Fig. 2(b)). Various approaches are discussed, e.g., in Todd (1977), segments produced by the convex combination of the sampled Babayev (1997), D’Ambrosio, Lodi, and Martello (2010). function values. Fig. 1(a) shows a function of a single variable, In general, we will consider a closed domain U # RL, and func- tions f ðx1; ...; xLÞ : U # R of L variables. We want to obtain a piece- wise linear approximation of f by splitting its domain U into a ⇑ finite collection R of L-dimensional convex bounded polytopes in Corresponding author. Tel.: +39 0512093029; fax: +39 0512093073. RL, that we will call regions. Recalling that the interior of an E-mail address: [email protected] (A. Lodi). http://dx.doi.org/10.1016/j.ejor.2014.03.020 0377-2217/Ó 2014 Elsevier B.V. All rights reserved.

Please cite this article in press as: Rovatti, R., et al. Optimistic MILP modeling of non-linear optimization problems. European Journal of Operational Research (2014), http://dx.doi.org/10.1016/j.ejor.2014.03.020 2 R. Rovatti et al. / European Journal of Operational Research xxx (2014) xxx–xxx

Fig. 1. Piecewise-linear approximation of a single-variable function.

Fig. 2. Piecewise-linear approximation of a two-variable function. (a) (b) l-dimensional polytope is the set of its points not belonging to any Fig. 3. Two possible linear interpolations exploiting the samples at the vertices of a of its facets, we require that regions satisfy two constraints. rectangle. Namely, as a convex combination of its vertices. Again for the case L ¼ 2, 0 00 0 00 C1 if r ; r 2R, then either they have disjoint interiors or r ¼ r ; this would mean to use all four vertices of the rectangle in 0 00 0 0 00 C2 if r ; r 2R; s is an (ðL 1Þ-dimensional) facet of r and s is Fig. 3(a) to define the convex combination. As it will be shown in 00 0 00 an (ðL 1Þ-dimensional) facet of r , then either s and s have the next sections, such a degree of freedom can be exploited to dis- 0 00 disjoint interiors or s ¼ s , criminate combinations among each other, for example in the case where special properties of f are known. Let us denote the number of vertices (0-dimensional faces) of Beyond samples on the vertices of each region, we may want to region r by r , and the vertices themselves by r m½ vj½ consider samples inside the regions. Hence, for each r 2Rwe de- (j ¼ 1; ...; m½r). The value of function f at the vertices is obtained fine a set of x½r P 0 points wj½r with j ¼ 1; ...; x½r such that by sampling the function at j½r for r 2R and j ¼ 1; ...; m½r.No v wj½r2r and for which f ðwj½rÞ is known. Once the domain partition global knowledge is assumed for defining the approximation, i.e., R is decided, regions r 2Rfor which x½r > 0 can be thought of as the only available information is carried by samples of f. For a an ‘‘oversampled’’ region in which information is collected without non-degeneracy argument, we will also impose that any region a predetermined structure. r 2Rsatisfies Based on the above definitions we give the two approximating functions, f and ^f , by the solutions of the two linear problems ((1) C3 m½r > L with at least L of the m½r vertices linearly s.t. (3)–(6) and (2) s.t. (3)–(6)): independent. Xm½r Xx½r 1 L As r is a convex bounded polytope, a point x ¼ðx ; ...; x Þ2r f ðxÞ¼min ajf ðvj½rÞ þ bjf ðwj½rÞ ð1Þ can be expressed as a convex combination of the vertices of r. j¼1 j¼1 The most natural approach to define a region r is to use L þ 1 inde- Xm½r Xx½r ^ pendent vertices (i.e., an L-dimensional simplex) so that the convex f ðxÞ¼max ajf ðvj½rÞ þ bjf ðwj½rÞ ð2Þ combination is unique as well as the resulting piecewise linear j¼1 j¼1 approximation of f. This is the most classical approximation meth- aj P 0 j ¼ 1; ...; m½rð3Þ od (see, Todd (1977) and Lee and Wilson (2001)). bj P 0 j ¼ 1; ...; x½rð4Þ As already observed, in the case of functions of two variables Xm½r Xx½r (i.e., L ¼ 2), the regions are triangles. Note that each rectangle in aj þ bj ¼ 1 ð5Þ the sampling grid, say of vertices v1; v2; v3 and v4 (see Fig. 3(a)), j¼1 j¼1 can be subdivided into two triangles by selecting either diagonal Xm½r Xx½r ½ 1; 4 or diagonal ½ 2; 3. As a consequence, a point x in the v v v v ajvj½rþ bjwj½r¼x: ð6Þ shaded area of Fig. 3(a) can be expressed as a convex combination j¼1 j¼1 of two distinct sets of vertices, namely fv1; v2; v4g and fv1; v2; v3g. In Fig. 3(b) we show the plot of the two possible interpolations for Among all possible convex combinations, f (resp. ^f ) selects the one all the points in the shaded domain of Fig. 3(a), with dashed lines yielding the minimum (resp. maximum) value of the combination connecting each interpolation with the samples used for its of the corresponding function values. Note that system above is computation. identical for any possible choice of region r. However, in the special On the other hand, a region r does not need to be an L-dimen- case in which x½r¼0 and r itself is a simplex for which m½r¼L þ 1 sional simplex, and a point x 2 r can be expressed in different ways there is nothing to optimize as the convex combination is unique.

Please cite this article in press as: Rovatti, R., et al. Optimistic MILP modeling of non-linear optimization problems. European Journal of Operational Research (2014), http://dx.doi.org/10.1016/j.ejor.2014.03.020 R. Rovatti et al. / European Journal of Operational Research xxx (2014) xxx–xxx 3

M In the more general case, assume now that the function f is the fv1½r; ...; vm½r½rg [ fw1½r; ...; wx½r½rg. In other words, F ðrÞ objective function of a non-linear minimization (resp. maximiza- contains all possible piecewise linear interpolations based on tion) problem, and that such a problem is solved by approximating L þ 1 sampling points at a time; f with f (resp. f ). In this case the minimization (resp. maximization) the difference between any two functions f and g restricted to r defining f (1) (resp. ^f (2)) is implied by the minimization (resp. as their maximum absolute deviation, i.e. maximization) requested by the problem. Drðf ; gÞ¼maxjf ðxÞgðxÞj; Hence, the solver of the resulting piecewise linear problem x2r implicitly uses an approximation that yields, for each feasible point M and among all the possible linear interpolations of the samples that the maximum error due to an approximation in F ðrÞ as Strang are available due to non-simplex regions and oversampling, the and Fix (1973, chap. 3) one that agrees most with the sense of the optimization problem: ~ DmaxðrÞ¼ max Drðf ; f Þ; ð8Þ M the largest when seeking a maximum or the smallest when seeking ~f 2F ðrÞ a minimum. 2 Actually, this is the optimistic attitude that gives the name to the which is known to be OðMd½r Þ. proposed approximation scheme when embedded in an optimiza- tion problem. Note that, unless assumptions can be done both on the nonlocal For the case depicted in Fig. 3 and x½r¼0, f ðxÞ would use the behavior of the function to approximate and on the shape of the re- lower triangle in Fig. 3(b), and ^f ðxÞ the upper triangle. gions (e.g., on the smallest angle between any two incident edges Note that the use of a piecewise linear approximation that may of any possible simplex with vertices in be locally convex or concave may remind of global optimization fv1½r; ...; vm½r½rg [ fw1½r; ...; wx½rg), neither the second-order techniques (see, e.g., Belotti et al. (2013)). Actually, in that context, trend in d½r nor the value of the constant implicit in the O notation a convex piecewise linear under-estimation and a concave piece- can be transformed into a stricter guarantees on the approximation wise linear over-estimation of a generic non-convex function are error. used to create a convex polyhedral relaxation, and those under- The following result ensures that the approximations intro- and over-estimations are iteratively improved by branching. This duced in the previous section can be profitably used to reduce leads to compute a global optimal solution for the problem at hand, the complexity of piecewise-linear models of non-linear optimiza- with the significant difficulty that finding appropriate under- and tion problems without impairing approximation quality. over-estimations might be hard, especially for L > 2, and refining Though some of the points in Theorem 1 can be inferred from them is computationally arduous (see again Belotti et al. (2013) ongoing studies on the properties of convex and concave envelopes and Tawarmalani, Richard and Xiong (2013)). Yet, in our approach, of non-linear functions (see, e.g., Theorem 2.4 in Tawarmalani, convex and concave approximations are not simultaneously em- Richard, & Xiong (2013)), we chose to present them in a self-con- ployed but implicitly chosen by the solver and not refined over tained form to integrate other issues that are equally important time. for our purposes like continuity of the approximation, asymptotic The result is that the computed solution is approximated, i.e., error bounds, optimality of approximation for convex and concave it might not even be feasible when the function f appears in a non-linear functions. constraint of the original MINLP problem, with the relevant ^ advantage that the out-of-the-shelf solution of a single MILP Theorem 1. The approximations f and f defined by (1)–(6) are such suffices. that In the remainder of the paper we will show that f and ^f have ^ favorable theoretical properties (Section 2) and that they are espe- A. f restricted to any r 2R is convex while f restricted to any cially good if used within an MILP approach (Section 3), resulting in r 2Ris concave; ^ more tractable MILP models with respect to traditional models. B. f and f are continuous; ^ M Computational results using the proposed approximations are re- C. f and f restricted to any r 2Rbelong to F ðrÞ; ^ ported in Section 4, where we evaluate its performance in the MILP D. if x 2 r, then for any y such that f ðxÞ 6 y 6 f ðxÞ there is a set of 0 0 context. Some conclusions are finally drawn in Section 5. real numbers aj for j ¼ 1; ...; m½r and bj for j ¼ 1; ...; x½r such P P P 0 0 m½r 0 x½r 0 m½r 0 that a ; b P 0 8j, a þ b ¼ 1, a vj½rþ P j j P j¼1 j j¼P1 j j¼1 j 2. Theoretical properties of optimistic modeling x½r 0 m½r 0 x½r 0 j¼1 bjwj½r¼x, and j¼1ajf ðvj½rÞ þ j¼1 bjf ðwj½rÞ ¼ y; E. for any r 2Rwe have In this section we derive the properties of the piecewise-linear approximation introduced in the previous section under the Drðf ; f Þ 6 DmaxðrÞ 1 L assumption that the target function f ðx ; ...; x Þ : U # R has sec- ^ @2f Drðf ; f Þ 6 DmaxðrÞ; ond-order derivatives arranged in an Hessian matrix Hjk ¼ @xj@xk (j; k ¼ 1; ...; L) that is bounded, i.e., such that F. if f is linear then f ¼ ^f ¼ f; G. if f is convex (resp. concave) in a certain r 2R, then in that maxfmaxjHjkjg 6 M ð7Þ j;k U region f (resp. ^f ) is the best possible linear interpolation of for some finite M > 0. the samples in the sense of Drðf ; Þ. Given a region r # U, we define. H. if f is convex (resp. concave) in a certain r 2Rand we compute f (resp. ^f ) relaying on the sampling points 0 ^0 its diameter as the smallest positive d½r such that the Euclidean fv1½r; ...; vm½r½r; w1½r; ...; wx½r½rg and f (resp. f ) relaying norm satisfies kx0 x00k 6 d½r for any pair of points x0; x00 2 r; on the sampling points M ~ # the set F ðrÞ of the functions f : r R that are piecewise linear fv1½r; ...; vm½r½r; w1½r; ...; wx½r½r; wx½rþ1½rg, then in that interpolations of f, with ‘‘pieces’’ being non-overlapping region f 0 is not worse than f (resp. ^f 0 is not worse than ^f) in simplexes whose vertices belong to

Please cite this article in press as: Rovatti, R., et al. Optimistic MILP modeling of non-linear optimization problems. European Journal of Operational Research (2014), http://dx.doi.org/10.1016/j.ejor.2014.03.020 4 R. Rovatti et al. / European Journal of Operational Research xxx (2014) xxx–xxx

the sense of Drðf ; Þ and at least one point x 2 r exists such that interpolation of L þ 1 values of the given function f at the f 0ðxÞ (resp. ^f 0ðxÞ) gives a non-zero coefficient to the sample vertices of an L-dimensional simplex containing x. Given the m½rþx½r sampling points in r, one may construct f ðwx½rþ1½rÞ. m½rþx½r N½r¼ such simplexes by choosing any L þ 1 L þ 1

vertices from fv1½r; ...; vm½r½rg [ fw1½r; ...; wx½r½rg. Denote Proof. For any region r, define f ¼ðf ðv1½rÞ; ...; f ðvm½r½rÞ; such simplexes as sj for j ¼ 1; ...; N½r, and associate to each f ðw1½rÞ; ...; f ðwx½r½rÞÞ and c ¼ða1; ...; am½r; b1; ...; bx½rÞ. of them the function fj : sj # R that yields the linear interpo- Further let rðxÞ be the set of the coefficients of the affine lation of the values of f at the vertices of sj. Further let combinations of the sampling points that yield x, i.e., 6 6 noP P P P SðxÞ¼fj : 1 j N½r; x 2 sjg be the set of those simplexes m½r x½r m½r x½r rðxÞ¼ c : j¼1aj þ j¼1 bj ¼ 1and j¼1ajvj½rþ j¼1 bjwj½r¼x , which contain x. Then and the set of nonnegative coefficients in , i.e., g c f ðxÞ¼minffjðxÞg ð9Þ j2SðxÞ g ¼fc : aj P 0forj ¼ 1; ...; m½randbj P 0forj ¼ 1; ...; x½rg. With ^ this, the feasibility polytope of the problem defining f ðxÞ and f ðxÞ ^ f ðxÞ¼maxffjðxÞg; ð10Þ is rðxÞ\g. j2SðxÞ

which are clearly piecewise-defined expressions. A. Focus on f , and consider any two points x1; x2 2 r 2Rand a Let us now concentrate on f and characterize the correspond- generic point xe ¼ ex1 þð1 eÞx2 for e 2½0; 1. With refer- 0 T ing ‘‘pieces’’. If x is a point in the interior of a piece, then f ðxÞ is ence to (1), we know that f ðx1Þ¼cðx1Þf for some vectors linear in a ball bðx0Þ with center x0. Without loss of generality, cðx Þ2rðx Þ\g, and that f ðx Þ¼cðx Þf T for some 1 1 2 2 we may assume that in such a neighborhood we have cðx Þ2rðx Þ\g. Hence 2 2 f ðxÞ¼f1ðxÞ, and prove that f ðxÞ¼f1ðxÞ holds for any x 2 s1. T T 0 0 ef ðx1Þþð1 eÞf ðx2Þ¼ðecðx1Þþð1 eÞcðx2ÞÞf ¼ cðxeÞf To do so, consider any point x 2 s1 n bðx Þ, the segment xx con- necting it with x0, and a further point x00 2 xx0 \ bðx0Þnfx0g, with cðxeÞ2rðxeÞ\g. It follows that cðxeÞ contains the coeffi- which is surely non-empty. Let e 2½0; 1 be the real number cients of one of the convex combinations yielding xe in terms 00 0 of the vertices of the sampling points and, from (1), such that x ¼ ex þð1 eÞx. Since f is convex, we have T 00 6 0 1 00 f ðxeÞ 6 cðxeÞf ¼ ef ðx1Þþð1 eÞf ðx2Þ, which proves the thesis. f ðx Þ ef ðx Þþð1 eÞf ðxÞ, from which f ðxÞ P 1e f ðx Þ ^ e 0 0 0 00 00 As far as f is concerned, given the target function f, define the 1e f ðx Þ. Since f ðx Þ¼f1ðx Þ and f ðx Þ¼f1ðx Þ, we have function u ¼f . Since ^f ¼u and u is convex, ^f is concave. 1 e ^ f ðxÞ P f ðx00Þ f ðx0Þ¼f ðxÞ; B. It follows from A that f and f are continuous in the interior of 1 e 1 1 e 1 1 any r 2R. To include the boundaries of the regions in the where we have exploited the fact that f is linear. The fact that continuity set of f and ^f , let us first reason from within a 1 given r 2Rand consider a point x0 on the boundary and a f ðxÞ P f1ðxÞ, paired with (9) and with x 2 s1, yields f ðxÞ¼f1ðxÞ. 00 point x in the interior of r. Hence, f is linear in the whole interior of simplex s1, and, from Measure the distance between any two subsets S and T of the continuity of f , we get that it is linear over the whole sim- m½r M R by means of the Hausdorff distance plex and thus that it belongs to F ðrÞ. M The same arguments prove that ^f belongs to F ðrÞ. hðS; TÞ¼max supinfks tk; supinfks tk ; D. Since f x 6 y 6 ^f x we may set ^f x y = ^f x f x s2S t2T t2T s2S ð Þ ð Þ e ¼ð ð Þ Þ ð ð Þ ð ÞÞ to have y ¼ ef ðxÞþð1 eÞ^f ðxÞ. Since f ðxÞ¼cf T for some where kkis the usual Euclidean norm. Since the two affine c 2 rðxÞ\g, and ^f ðxÞ¼c^f T for some c^ 2 rðxÞ\g, we get that 0 00 subspaces rðx Þ and rðx Þ correspond to each other by means c0 ¼ ec þð1 eÞc^ 2 rðxÞ\g and is such that c0f T ¼ y. 0 00 of a translation proportional to x x , there exists a constant E. Immediate from C. 0 00 6 0 00 C such that hðrðx Þ; rðx ÞÞ Ckx x k. It follows that F. If f is linear then its Hessian is vanishing and we may set lim hðrðx0Þ\g; rðx00Þ\gÞ 6 lim Ckx0 x00k¼0: M ¼ 0in(7), so from (8) we get DmaxðrÞ¼0. From E we x00!x0 x00!x0 finally obtain ^f ¼ f ¼ f . Hence, as x00 ! x0, the feasibility set of the problem defining G. Let us first concentrate on the convex f case. By Jensen 6 T f ðx0Þ (resp. ^f ðx0Þ) tends to coincide with the feasibility set of (1906) inequality, for any c 2 rðxÞ\g, we have f ðxÞ cf . Since the inequality holds for any convex combination we the problem defining f ðx00Þ (resp. ^f ðx00Þ). Since the objective have f 6 f over the whole r. Pairing this with the definition ^ function of f ðxÞ (resp. f ðxÞ) is linear, hence continuous, also of f ,if~f is any linear interpolation of the values of f at the 0 00 its values at x and x must tend to each other. vertices of r, we have f 6 f 6 ~f . It follows that, at any point We have thus proved that f and ^f are continuous on the clo- x 2 r; ~f ðxÞf ðxÞ P f ðxÞf ðxÞ P 0 hence the thesis. sure of each region. As each region has an (ðL 1Þ-dimen- If f is concave we may follow the same path and arrive at sional)-facet in common with every other neighboring f P ^f P ~f yielding the same result. region, this implies continuity across regions, and ultimately H. Let us first concentrate on the convex f case. Since the prob- over the whole domain U. lems defining f 0 has one more degree of freedom (the coeffi-

C. Since rðxÞ is a ðm½rþx½rðL þ 1ÞÞ-dimensional affine sub- cient of the additional sampling point wx½rþ1½r) with respect space of Rm½rþx½r, the vertices of the feasibility polytope of to the one defining f , we clearly have f 0 6 f . Yet, from A we the linear programming problems defining f and ^f are such also have f 6 f 0 and thus f 6 f 0 6 f implying that f 0 is not that at least m½rþx½rðL þ 1Þ positivity constraints on worse than f in the sense of Drðf ; Þ.

the aj’s and bj’s are active, i.e., their solutions feature at most Moreover, if c is the vector of coefficients of any affine com- L þ 1 non-zero variables among the coefficients in c. Hence, binations of the sampling points ^ at each point x, the values of f ðxÞ and f ðxÞ are the linear fv1½r; ...; vm½r½r; w1½r; ...; wx½r½rg used to compute f , the

Please cite this article in press as: Rovatti, R., et al. Optimistic MILP modeling of non-linear optimization problems. European Journal of Operational Research (2014), http://dx.doi.org/10.1016/j.ejor.2014.03.020 R. Rovatti et al. / European Journal of Operational Research xxx (2014) xxx–xxx 5

convexity of f and Jensen (1906) inequality ensure that each region to implicitly triangulate it in a number of simplexes T f ðwx½rþ1½rÞ 6 cf . within which they are nothing but a conventional piecewise linear 0 Hence, at least for the computation of f ðwx½rþ1½rÞ the addi- interpolation. For this reason, in absence of a priori information on tional sample cannot be neglected without violating opti- the target function, further to its values at the sampling points, the mality requirement in (1)–(6). maximum approximation error is of the same magnitude as that of If f is concave we may follow the same path and arrive at any other piecewise linear interpolation. ^0 ^ ^0 f P f P f and at the fact that at least f ðwx½rþ1½rÞ involves h the additional sample with a non-null coefficient. 3. Embedding f and f^ in an MILP Model

Fig. 4 depicts, for the case L ¼ 2 and a single region given by Both f and ^f may be exploited in the definition of a piecewise a regular heptagon, the three-dimensional plots of f ðxÞ and ^f ðxÞ linear approximation of a non-linear optimization problem by and the induced triangulations of the heptagonal region. means of a Mixed Integer Linear Programming problem. Fig. 4(a) (resp. Fig. 4(b)) considers f (resp. ^f ) for m½r¼7 and Formally speaking, we address the optimization problem 0 0 x½r¼0, while Fig. 4(a ) (resp. Fig. 4(b )) adds x½r¼3 sampling min f ðxÞð11Þ points at the vertices of an equilateral triangle. The new sam- Ax 6 b ð12Þ pling points affect f in Fig. 4(a0) clearly improving the approxi- g x 6 0 13 mation quality with respect to any target convex function f ð Þ ð Þ x U; 14 with those samples. The same points do not affect ^f in 2 ð Þ 0 Fig. 4(b ) since the ‘‘direction’’ of its intrinsic optimism prevents where f ðxÞ : RL # R is a smooth non-linear objective function, such an approximation to take into account novel ‘‘bad’’ news Ax 6 b is a set of linear constraints, gðxÞ 6 0 is a set of smooth about the behavior of the target function. Convexity and concav- non-linear constraints, and U # RL. ity, as well as their piecewise-linearity can be verified by simple Remind from Section 1 that U is partitioned into polytopes r visual inspection in all cases. collected in R. Our piecewise linear approximation of problem Theorem 1 holds for regions that are not necessarily simplexes, (11)–(14) is based on the values of f and g at the sampling points. and establishes the ‘‘equivalence’’ between f and ^f on one hand and We will denote by V ¼[r2Rfv1½r; ...; vm½r½rg the set of all vertices classical piecewise linear interpolation of f within triangulated do- of all regions in R, which we will assume to contain a total of m½R mains on the other hand. Indeed, both f and ^f use the samples in elements and by W ¼[r2Rfw1½r; ...; wx½r½rg the set of all the non-

Fig. 4. An example of the three-dimensional plot of f ((a) and (a0)), and of ^f ((b) and (b0)) without and with internal oversampling. Note that figures are rotated in different ways to highlight convexity or concavity.

Please cite this article in press as: Rovatti, R., et al. Optimistic MILP modeling of non-linear optimization problems. European Journal of Operational Research (2014), http://dx.doi.org/10.1016/j.ejor.2014.03.020 6 R. Rovatti et al. / European Journal of Operational Research xxx (2014) xxx–xxx vertices sampling points in all regions in R, which we will assume Consider a normalized setting in which domain U has an overall to contain a total of x½R elements. unit volume, and establish the approximation quality by prefixing The linear interpolation of the function values at the sampling the value of d. Within this bound, and depending on their shape, points can be modeled by associating a real variable av to any ver- the regions in R will have a maximum volume of l so that not less tex v 2 V (the collection of all such variables will be simply indi- than R ¼ 1=l of them are needed in the partition of U. cated by a), a real variable bw to any internal sampling points Any polytope with a diameter bounded by d fits within a hyper- w 2 W (the collection of all such variables will be simply indicated sphere of the same diameter, hence its volume is bounded from L L=2 by b) and a binary variable hr to any region r 2R, and by imposing above by ðd=2Þ p =CðL=2 þ 1Þ, where C is the Gamma function. constraints that force non-zero av and bw variables to be those cor- Hence, U cannot be partitioned in less than responding to the same region. The piecewise linear approximation Rr ¼ dL2LpL=2CðL=2 þ 1Þ regions. We will compare two simplex is then partitioning strategies with the proposed hyper-rectangular X X approach. min av f ðvÞþ bwf ðwÞð15Þ One of the most common triangulations employs regions that v2V w2W have the same volume of the so called standard simplex, i.e., a sim- av P 0 v 2 V ð16Þ plex with L equal-length sides converging at right angles to a single b P 0 w 2 W ð17Þ Xw X vertex. A standard simplex with diameter d has a volume equal to pffiffiffi L av þ bw ¼ 1 ð18Þ d= 2 =L!, so the total number or regions in U cannot be less than v2V w2W Rss ¼ dL2L=2L!. Ax 6 b ð19Þ X X If generic simplex shapes are used with diameter d, the volume 6 of each region is bounded fromffiffiffi above by the volume of the regular av gðvÞþ bwgðwÞ 0 ð20Þ p Lpffiffiffiffiffiffiffiffiffiffiffi simplex of side d, i.e., by d= 2 L 1=L!. Hence, any triangulation Xv2V Xw2W ð Þ þ pffiffiffiffiffiffiffiffiffiffiffi of U cannot have less than Rrs ¼ dL2L=2L!= L þ 1 regions. av v þ bww ¼ x ð21Þ V w W By using regular hyper-rectangularffiffiffi shapes, a diameter d implies v2 X2 X p L 6 a region volume of ðd= LÞ and thus a number of regions equal to av þ bw hr v 2 V ð22Þ Rr dLLL=2. w2W:v;w2r2R r2R:v2r ¼ X Fig. 5 reports on a logarithmic scale the ratio between the min- h ¼ 1 ð23Þ r imum number of regions needed by the above tessellations and the r2R r Rss Rrs lower bound R , when L varies from 2 to 7, namely Rr (dots), Rr hr 2f0; 1g r 2R; ð24Þ Rrr (squares), and R (diamonds). It is clear that standard simplex tri- angulation is not the best tessellation strategy, and that triangula- where (19) is the original set of linear constraints, (15)–(18), (20), tions in general yield a number of regions whose trend is (and) (21) express the convex combination of the vertices ((15) and unfavorable with respect to the one of simple rectangular tessella- (20) being the approximations of the objective function and of the tions. More complex tessellations are difficult to characterize for non-linear constraints, respectively) and (22)–(24) impose that L > 3 because they strongly depend on the dimensionality of the the convex combination uses only the vertices of the unique region domain. r with hr ¼ 1, along with the possible additional sampling points within it. Recall that model (15)–(24) is an approximation and not a valid relaxation of the original problem (see the considerations on global optimization techniques at the end of Section 1). 3.2. Hyper-rectangular domains, regions and triangulation Traditional approaches to linear approximation assume that the regions r 2Rare simplexes and no oversampling, which, for any The most common variable domains U are hyper-rectangles de- given x, implies that the convex combination is unique. Model fined by bounds on the variable values. Hence, although any collec- (15)–(24) is however valid for any choice of the regions and any tion of regions satisfying Conditions C1-C3 of Section 1 can be oversampling provided Conditions C1-C3 of Section 1 are satisfied. adopted to approximate a given function, it is natural to partition In particular, if there are regions with more than L linearly inde- the domain into hyper-rectangular regions, also because this yields pendent vertices (see Condition C3) or additional samples are con- the simplest structure of all. For this reason, this section is devoted sidered, then for some x values the convex combination is not to the specialization of the MILP model of Section 3 to the case in unique, hence the resulting solution minimizes the objective func- which U is a hyper-rectangle, and samples of the objective and tion. We refer to such kind of situations as optimistic modeling. constraint functions are defined on a regular grid. This model adopts a standard formulation. Effective, more com- plex methods, requiring a smaller number of variables and con- straints, were recently proposed by Vielma and Nemhauser (2011) and are discussed and extended to oversampled cases in the second part of Section 4.

3.1. Partitioning strategies

From a practical viewpoint, it turns out that the smaller the number of regions in R the lighter the resulting formulation. On the other hand, the approximation error is bounded by the diame- ter d ¼ maxrd½r of the widest region in R. In this section we eval- uate the trade-off between approximation quality and tractability of the MILP problem. Although a specific tuning can only be done when the shape of domain U and the position of the sampling points are known, the potential of optimistic modeling may be Fig. 5. Normalized lower bounds on the number of regions in R for different appreciated with few considerations. partitioning strategies.

Please cite this article in press as: Rovatti, R., et al. Optimistic MILP modeling of non-linear optimization problems. European Journal of Operational Research (2014), http://dx.doi.org/10.1016/j.ejor.2014.03.020 R. Rovatti et al. / European Journal of Operational Research xxx (2014) xxx–xxx 7

l l l l l l We assume that nl points x1 < x2 < ...< xn 1 < xn (x1 and xn nXl1 l l l l being the lower and upper bounds, respectively) are given for each hj ¼ 1 l ¼ 1;...;L ð34Þ l ¼ 1; ...; L, and that the point j ¼1 1 L vðj1; ...; jLÞ¼ðxj ; ...; xj Þ l j ¼ 1;...;nl 1 1 L h 2f0;1g ð35Þ j l ¼ 1;...;L is a sampling point for any set of indices jl ¼ 1; ...; nl and for any l ¼ 1; ...; L. l Q where h0 ¼ 0 for each l ¼ 1; ...; L. Equations (26)–(32) and (35) are L The domain is partitioned into l¼1ðnl 1Þ hyper-rectangular the straightforward counterparts of equations (15)–(21) and (24), regions (each of them identified by the L-ple of indices identifying respectively. Consider now (34), and let jl denote the unique index its ‘‘bottom-left’’ corner) l hihi j for which hj takes the value one (l ¼ 1; ...; L): the set of such L rðj ; ...; j Þ¼ x1 ; x1 xL ; xL ð25Þ indices identifies the selected region rðj ; ...; jÞ (see (25)). It follows 1 L j1 j1þ1 jL jLþ1 1 L that (33) automatically sets to zero all a coefficients corresponding for jl ¼ 1; ...; nl 1 and l ¼ 1; ...; L. Sub-domain rðj1; ...; jLÞ has the to vertices which do not belong to the selected region and the b L 2 vertices vðk1; ...; kLÞ for all possible assignments of coefficients corresponding to additional sampling points outside kl 2fjl; jl þ 1g (l ¼ 1; ...; L). that region. When a full triangulation of an hyper-rectangular U is adopted, Since this is separately imposedP for each of the L dimensions, it is common to proceed by steps. First, U is partitioned into hyper- L the modelQ requires in total only l¼1ðnl 1Þ binary variables for rectangular sub-domains rðj ; ...; j Þ as in (25). Then, each sub-do- L 1 L modeling l¼1Qðnl 1Þ regions. This compares very favorably with main is further partitioned into L! equivalent simplexes that share L the number L! l¼1ðnl 1Þ of binary variables required by the clas- a single edge, which is also one of the possible diagonals of the con- sical models for triangulation. taining hyper-rectangle. Common diagonals change from one sub- Recently, Vielma and Nemhauser (2011) proposed new, more domain to an adjacent one, following a regular pattern that avoids complex methods for modeling triangulations, based on properly the possibility of identifying a preferred direction. Due to the well- encoded binary decision trees, which require a number of binary recognizable pattern of its two-dimensional realization, such a variables and constraints that is logarithmic in the number of globally isotropic triangulation is often referred to as Union Jack regions. (Todd, 1977). When applied to non-oversampled models (i.e., to models with

Simplexes are all equivalent to the standard one and their W ¼;), the method relies on B ¼ delog2ðRÞ binary variables arrangement allows the implementation of extremely effective hj 2f0; 1g for j ¼ 1; ...; B and substitutes constraints (33)–(35) high-dimensional interpolators when the time needed to compute with 2B inequalities of the kind the value of the approximation in a given point is the key factor X X X 6 (Kuhn, 1960; Rovatti, Borgatti, & Guerrieri, 1998). av hu þ ð1 huÞð36Þ When straightforwardly embedded in an MILP model, this ap- v2Vj u2Hj u2f1;...;BgnHj proach requires a binary variable per region (Lee & Wilson, Q for j ¼ 1; ...; 2B and properly chosen sets V j V and Hj f1; ...; Bg. 2001), hence a total of L! L ðn 1Þ binary variables. We next l¼1 l In our setting, the same inequalities can be used to accommo- show that in the approach we propose the number of binary vari- date oversampling by generalizing them to ables is drastically reduced by defining the regions as hyper-rect- X X X X angles, hence avoiding the partition through simplexes. 6 av þ bw hu þ ð1 huÞð37Þ For the case of hyper-rectangular domains, model (15)–(24) can V 9v2V u H u 1;...;B H v2 j j v;w2s 2 j 2f gn j w2W: 9s2sðRÞ be stated as follows. For each vertex v ¼ vðj1; ...; jLÞ, we represent av as aðj1; ...; jLÞ and define the set J of all possible index L-tuples where sðRÞ is the triangulation of R, i.e. the collection of all the J ¼f1; ...; n1gf1; ...; n2g...f1; ...; nLg along with the set simplexes triangulating each region of R. Jðl; j Þ of all possible index L-tuples in which the l-th index is fixed From (33) and (37) we get that, both in the standard and loga- to j , i.e., Jðl; j Þ ¼ f1; ...; n1g...f1; ...; nl1gfj gf1; ...; rithmic implementations, once the model is laid down encoding nlþ1g...f1; ...; nLg. Then, the piecewise linear approximation the samples on the grid, further samples can be injected in it by (15)–(24) can be expressed as adding a single column per additional sample. Moreover, for hyper-rectangular domains sampled on regular Xn1 XnL X grids, the B binary variables and 2B constraints (36) (and thus min ... aðj1;...;jLÞf ðvðj1;...;jLÞÞ þ bwf ðwÞð26Þ j1 ¼1 jL¼1 w2W the corresponding constraint (37)) can be partitioned in two dis- joint sets, one dedicated to model the selection of a single hyper- aðj ;...;j Þ P 0 ðj ;...;j Þ2J ð27Þ 1 L 1 L rectangular region and the other in charge of selecting a single sim- bw P 0 w 2 W ð28Þ plex within that hyper-rectangle. Hence, the optimistic model can Xn1 XnL X be implemented exploiting the results in Vielma and Nemhauser ... j ;...;j b 1 29 að 1 LÞþ w ¼ ð Þ (2011), generalized as in (37), by simply discarding the second j ¼1 j ¼1 w2W 1 L set of binary variables and constraints (i.e., rows). In Section 4 6 Ax b ð30Þ we provide a computational comparison of the different imple- Xn1 XnL X mentations of both methods. 6 ... aðj1;...;jLÞgðvðj1;...;jLÞÞ þ bwgðwÞ 0 ð31Þ

j1 ¼1 jL¼1 w2W

Xn1 XnL X 4. Computational experiments

... aðj1;...;jLÞvðj1;...;jLÞþ bww ¼ x ð32Þ j1 ¼1 jL¼1 w2W In this section we perform a detailed computational evaluation of the proposed approximation within MILP models. Specifically, in X ðj1;...;jLÞ2Jðl;j Þ l l the next subsections we consider three simple Non-Linear Pro- aðj ;...;j Þþ b 6 h þ h l ¼ 1;...;L 1 L w j 1 j gramming (NLP) problems in two, three, and four variables, respec- w2W:w2rðj1;...;jLÞ j ¼ 1;...;nl tively, to be piecewise linear approximated by our optimistic ð33Þ modeling. Although such problems are easy and small, they allow

Please cite this article in press as: Rovatti, R., et al. Optimistic MILP modeling of non-linear optimization problems. European Journal of Operational Research (2014), http://dx.doi.org/10.1016/j.ejor.2014.03.020 8 R. Rovatti et al. / European Journal of Operational Research xxx (2014) xxx–xxx an evaluation of the impact of the optimistic modeling within an Results in Table 1 are very satisfactory. The percentage speed- MILP approach. The resulting MILP models were solved by CPLEX up obtained by the optimistic models is consistent and often very version 12.4 on a PC equipped with an hyperthread-enabled Intel impressive. This is true both for the standard and the logarithmic Core-i7 and 16 gigabyte of RAM. implementations. It is remarkable that the reduction in the MILP In Section 4.4 we consider a real-world Mixed Integer Non-Lin- model size obtained by applying the optimistic approach corre- ear Programming (MINLP) problem arising in an industrial context. sponds to a significant advantage for the MILP solver even when applied on top of the already very compact logarithmic formula- 4.1. An NLP in two variables tion. This shows that the optimistic models are also somehow eas- ier to ‘‘assimilate’’ by the solver. Consider the NLP with two variables, a non-linear objective The quality of the solutions if no oversampling is applied is ex- function and a non-linear constraint actly the same between ‘‘UJ’’ and ‘‘HR’’. However, if oversampling is used, a sometimes impressive improvement of both ‘‘UJ’’ and ‘‘HR’’ 2 2 8 x1 3 y2 max f ¼ e ðÞ3 ðÞ3 ð38Þ with respect to their standard versions can be observed, with the only exception of m ¼ 9. Although it only happens in two cases 1 2 1 2 1 10 x 10 y 6 0 ð39Þ and with a quite small difference (namely, m ¼ 129 and 2 2 m ¼ 257), the version of ‘‘HR’’ with oversampling seems to obtain ðx; yÞ2½0; 12; ð40Þ slightly better solutions (in terms of quality) than the correspond- ing version of ‘‘UJ’’. Computing times of the versions with and whose optimal solution is x ¼ 0:309054; y ¼ 0:752071 with objec- without oversampling are very comparable. Just to give a rough tive function value f ¼ 0:973753. information of CPU times in seconds, the highest values, corre- In order to piecewise linear approximate both the objective sponding to the case m ¼ 513, are around 450 CPU seconds. function (38) and the non-linear constraint (39) and obtain an Fig. 6 gives an intuitive view of the effect of the piecewise linear 2 MILP, the square domain ½0; 1 is partitioned, for different values approximation both on the objective function and on the con- 2 of m, into ðm 1Þ square sub-domains straint. Specifically, Fig. 6(a) reports, for m ¼ 5, the plot of the opti-  j 1 j j 1 j mistically approximated objective function within the rðj ; j Þ¼ 1 ; 1 2 ; 2 ; 1 2 m 1 m 1 m 1 m 1 approximated constraint, to be compared with the original objec- tive function and constraint in Fig. 6(b). In both pictures the with j1; j2 ¼ 1; 2; ...; m 1 for m 2f3; 5; 9; 17; 33; 65; 129; 257; 513g. light-gray dot indicates the approximated optimal solution, to be

Models using rðj1; j2Þ as regions are indicated as ‘‘HR’’, while compared with the true solution represented by the black dot. models further partitioning each rðj1; j2Þ into triangles following the Union Jack pattern are indicated as ‘‘UJ’’. The fact that an opti- 4.2. An NLP in three variables mistic approximation is used that implicitly considers a piecewise- linear concave interpolation of the objective function in any region Op Consider the NLP with three variables, a non-linear objective is indicated with an superscript. function and two linear constraints The standard implementation ((15)–(24) for full UJ models and 2 2 (26)–(35) for HR models) is indicated with ‘‘-std’’, while the loga- 2 8 x1 cosð2pzÞ1 8 y1 sinð2pzÞ1 max f ¼½1 þ sinðpz Þe ½5 2 ½5 2 ð41Þ rithmic implementation (Vielma & Nemhauser, 2011) is indicated 6 with ‘‘-log’’. When oversampling is considered, a further ‘‘-ovr’’ is x þ y þ z 6 0 ð42Þ added to indicate that 10 points, randomly drawn with a uniform 5 probability in the rectangle containing the true optimum solution, x þ y 6 0 ð43Þ have been added as non-structured samples. ðx; y; zÞ2½0; 13; ð44Þ As an example, HROp-log-ovr indicates a model with hyper-rect- angular regions, with a logarithmic implementation and oversam- whose optimal solution is x ¼ y ¼ 0:291929; z ¼ 0:616142, with pling that implicitly uses optimistic approximation of the original objective function value f ¼ 1:79436. problem. The cubic domain ½0; 13 is partitioned into ðm 1Þ3 cubic sub- The results are reported in Table 1. As far as the sizes of the four domains models are concerned, we report, for each value of m, the number  of constraints (constr.), the overall number of variables (var.), the j1 1 j1 j2 1 j2 j3 1 j3 rðj1; j2; j3Þ¼ ; ; ; ; number of non-zero entries (non-zero), and the number of binary m 1 m 1 m 1 m 1 m 1 m 1 variables (0–1 var.).

The performance is reported as the time needed by CPLEX to with j1; j2; j3 ¼ 1; 2; ...; m 1 and for m 2f3; 5; 9; 17; 33; 65; 129g. complete optimization expressed in CPLEX ticks (i.e., roughly, an As m increases, standard non-logarithmic implementations be- estimation of the computing time that is based on the determinis- come unmanageable. Hence, for both HR and UJ models, only the tic number of operations performed by the MILP solver, and that is logarithmic implementation (Vielma & Nemhauser, 2011) is con- therefore independent of the current overhead due to the machine sidered in which oversampling is possibly introduced by consider- load), the percent gain in solution time due to the adoption of an ing 10 points, randomly drawn with a uniform probability on the HR model instead of an UJ model, the adjustment in the constraint cube containing the true optimum solution. radius (p1ffiffiffiffi in the non-approximated problem) that is needed to The results are reported in Table 2 using the same notation as in 10 make the CPLEX solution feasible (the constraint is non-linear Table 1. and thus approximated), the value of f computed at the CPLEX The analysis of the results in Table 2 is quite similar to that of solution and the corresponding percent loss with respect to the Table 1. The speed-up is consistent and significant, and we are con- true maximum (possibly after the relaxation needed to make the sidering here only the logarithmic approach. There are more differ- CPLEX solution feasible). ences here in the solution quality of the ‘‘UJ’’ models with respect As a preliminary remark, note that the approximation of the to the ‘‘HR’’ ones, even without oversampling. Overall, as far as the constraint is very good for m > 3 and improves consistently as m objective function value is concerned, the best model is HROp-log- increases. ovr the only exception being m ¼ 9. Finally, though the improved

Please cite this article in press as: Rovatti, R., et al. Optimistic MILP modeling of non-linear optimization problems. European Journal of Operational Research (2014), http://dx.doi.org/10.1016/j.ejor.2014.03.020 R. Rovatti et al. / European Journal of Operational Research xxx (2014) xxx–xxx 9

Table 1 Problem (38)–(40): Sizes of the MILP models resulting from different approximations, implementations, and oversampling, along with the resulting performance.

m Model Constr. Vars. Non-zero 0–1 Time (CPLEX Gain in Relaxation of True value of Loss wrt (possibly Vars. ticks) solution time the non-linear solution relaxed) true (HR over UJ) constraint maximum (%) (%)

3 UJ-std 11 17 46 8 2: 101 1:1623 101 7:9807 101 25. Op HR -std 6 11 30 2 9: 102 55: 1:1623 101 7:9807 101 25. Op UJ -std-ovr 11 27 86 8 2: 101 0 9:4287 101 3.3 Op HR -std-ovr 6 21 70 2 1:4 101 30: 0 9:4287 101 3.3 UJ-log 8 12 36 3 9: 102 1:1623 101 7:9807 101 25. Op HR -log 6 11 30 2 8: 102 11: 1:1623 101 7:9807 101 25. Op UJ -log-ovr 8 22 86 3 1:6 101 0 9:4287 101 3.3 Op HR -log-ovr 6 21 70 2 1:1 101 31: 0 9:4287 101 3.3

5 UJ-std 28 57 195 32 7:6 101 0 9:2646 101 5.1 Op HR -std 12 31 112 6 2:3 101 70: 0 9:2646 101 5.1 Op UJ -std-ovr 28 67 245 32 8:9 101 0 9:6339 101 1.1 Op HR -std-ovr 12 41 172 6 3:2 101 64: 0 9:6339 101 1.1 UJ-log 12 30 134 5 2:2 101 0 9:2646 101 5.1 Op HR -log 10 29 120 4 2: 101 9:109:2646 101 5.1 Op UJ -log-ovr 12 40 204 5 3:2 101 0 9:6339 101 1.1 Op HR -log-ovr 10 39 180 4 2:8 101 12: 0 9:6339 101 1.1

9 UJ-std 84 209 747 128 4.14 6:21 103 9:7721 101 0.031 Op HR -std 20 95 368 14 7:9 101 81: 6:21 103 9:7721 101 0.031 Op UJ -std-ovr 84 219 797 128 4.27 1:1806 103 9:6293 101 1.2 Op HR -std-ovr 20 105 428 14 9:9 101 77: 1:1806 103 9:6293 101 1.2 UJ-log 16 88 568 7 7: 101 6:21 103 9:7721 101 0.031 Op HR -log 14 87 526 6 6:4 101 8:6 6:21 103 9:7721 101 0.031 Op UJ -log-ovr 16 98 658 7 9:2 101 1:1806 103 9:6293 101 1.2 Op HR -log-ovr 14 97 606 6 8:3 101 9:8 1:1806 103 9:6293 101 1.2

17 UJ-std 292 801 2899 512 9:61 3:8427 104 9:7391 101 0.009 Op HR -std 38 321 1236 32 3:08 68: 3:8427 104 9:7391 101 0.009 Op UJ -std-ovr 292 811 2949 512 8:81 1:2066 104 9:7377 101 0.0057 Op HR -std-ovr 38 331 1296 32 3:21 64: 1:2066 104 9:7377 101 0.0057 UJ-log 20 298 2526 9 3:17 3:8427 104 9:7391 101 0.009 Op HR -log 18 297 2380 8 2:94 7:3 3:8427 104 9:7391 101 0.009 Op UJ -log-ovr 20 308 2636 9 3:32 1:2066 104 9:7377 101 0.0057 Op HR -log-ovr 18 307 2480 8 3:07 7:5 1:2066 104 9:7377 101 0.0057

33 UJ-std 1092 3137 11443 2048 8:399 101 1:8708 104 9:7378 101 0.0093 Op HR -std 70 1153 4532 64 2:823 101 66: 1:8708 104 9:7378 101 0.0093 Op UJ -std-ovr 1092 3147 11493 2048 4:667 101 7:7902 105 9:7374 101 0.0064 Op HR -std-ovr 70 1163 4592 64 2:715 101 42: 7:7902 105 9:7374 101 0.0064 UJ-log 24 1100 11,572 11 1:348 101 1:8708 104 9:7378 101 0.0093 Op HR -log 22 1099 11,026 10 1:276 101 5:3 1:8708 104 9:7378 101 0.0093 Op UJ -log-ovr 24 1110 11,702 11 1:367 101 7:7902 105 9:7374 101 0.0064 Op HR -log-ovr 22 1109 11,146 10 1:287 101 5:9 7:7902 105 9:7374 101 0.0064

65 UJ-std 4228 12,417 45,419 8192 5:4867 102 8:0005 105 9:7371 101 0.0094 Op HR -std 134 4353 17,260 128 1:4623 102 73: 8:0005 105 9:7371 101 0.0094 Op UJ -std-ovr 4228 12,427 45,469 8192 4:0462 102 1:5477 105 9:7376 101 0.00018 Op HR -std-ovr 134 4363 17,320 128 2:595 101 94: 1:5477 105 9:7376 101 0.00018 UJ-log 28 4238 53,074 13 4:117 101 8:0005 105 9:7371 101 0.0094 Op HR -log 26 4237 50,960 12 4:018 101 2:4 8:0005 105 9:7371 101 0.0094 Op UJ -log-ovr 28 4248 53,224 13 4:034 101 1:5477 105 9:7376 101 0.00018 Op HR -log-ovr 26 4247 51,100 12 3:938 101 2:4 1:5477 105 9:7376 101 0.00018

129 UJ-std 16,644 49,409 180,963 32,768 5:1699 103 4:2528 105 9:7377 101 0.0005 Op HR -std 262 16,897 67,300 256 1:3438 102 97: 4:2528 105 9:7377 101 0.0005 Op UJ -std-ovr 16,644 49,419 181,013 32,768 5:1723 103 3:3548 106 9:7375 101 0.001 Op HR -std-ovr 262 16,907 67,360 256 1:161 102 98: 4:3454 106 9:7375 101 0.00012 UJ-log 32 16,656 241,808 15 1:877 102 4:2528 105 9:7377 101 0.0005 Op HR -log 30 16,655 233,486 14 1:8054 102 3:8 4:2528 105 9:7377 101 0.0005 Op UJ -log-ovr 32 16,666 241,978 15 2:0998 102 3:3548 106 9:7375 101 0.001

(continued on next page)

Please cite this article in press as: Rovatti, R., et al. Optimistic MILP modeling of non-linear optimization problems. European Journal of Operational Research (2014), http://dx.doi.org/10.1016/j.ejor.2014.03.020 10 R. Rovatti et al. / European Journal of Operational Research xxx (2014) xxx–xxx

Table 1 (continued)

m Model Constr. Vars. Non-zero 0–1 Time (CPLEX Gain in Relaxation of True value of Loss wrt (possibly Vars. ticks) solution time the non-linear solution relaxed) true (HR over UJ) constraint maximum (%) (%)

Op HR -log-ovr 30 16,665 233,646 14 1:7714 102 16: 4:3454 106 9:7375 101 0.00012

257 UJ-std 66,052 197,121 722,387 131,072 7:6506 104 6:0169 106 9:7376 101 0.00019 Op HR -std 518 66,561 265,684 512 5:9009 102 99: 6:0169 106 9:7376 101 0.00019 Op UJ -std-ovr 66,052 197,131 722,437 131,072 7:6318 104 1:0499 106 9:7375 101 0.000048 Op HR -std-ovr 518 66,571 265,744 512 6:0133 102 99: 2:4834 107 9:7375 101 0.0000036 UJ-log 36 66,066 1,090,822 17 8:4238 102 6:0169 106 9:7376 101 0.00019 Op HR -log 34 66,065 1,057,796 16 8:0241 102 4:7 6:0169 106 9:7376 101 0.00019 Op UJ -log-ovr 36 66,076 1,091,012 17 8:1737 102 1:0499 106 9:7375 101 0.000048 Op HR -log-ovr 34 66,075 1,057,976 16 7:8382 102 4:1 2:4834 107 9:7375 101 0.0000036

513 UJ-std 263,172 787,457 2,886,595 524,288 1:2039 106 4:4481 107 9:7375 101 0.00019 Op HR -std 1030 264,193 1,055,684 1024 3:7253 103 100: 4:4481 107 9:7375 101 0.00019 Op UJ -std-ovr 263,172 787,467 2,886,645 524,288 1:2039 106 4:4481 107 9:7375 101 0.00019 Op HR -std-ovr 1030 264,203 1,055,744 1024 3:7225 103 100: 4:4481 107 9:7375 101 0.00019 UJ-log 40 263,188 4,870,652 19 4:5976 103 4:4481 107 9:7375 101 0.00019 Op HR -log 38 263,187 4,739,066 18 3:8301 103 17: 4:4481 107 9:7375 101 0.00019 Op UJ -log-ovr 40 263,198 4,870,862 19 4:5966 103 4:4481 107 9:7375 101 0.00019 Op HR -log-ovr 38 263,197 4,739,266 18 3:8283 103 17: 4:4481 107 9:7375 101 0.00019

performance of oversampled models comes at some computational proves by doing oversampling, and oversampling is not in general price, differences remain reasonable. slower.

4.3. An NLP in four variables 4.4. A real-world MINLP

Consider the NLP with four variables, a non-linear objective The final step of this experimental evaluation deals with a real- function and a linear constraint world MINLP. We consider the problem of finding the optimal max f ¼ ðÞ1 þ w þ x þ y þ z scheduling of a multi-unit hydro power station, known as Short-  Term Hydro Scheduling problem. In a short-term time horizon it is 1 2 required to maximize the power selling revenue by assuming that sin 2p w þ sin 2p x þ 5 5 (a) the generation company acts as a price-taker, and (b) the elec-  tricity prices and the inflows are forecast. We consider the MINLP 3 4 sin 2p y þ sin 2p z þ ð45Þ model presented in Borghetti et al. (2008) in which all constraints 5 5 are linear, while the objective function has a non-linear term, as the power production is a non-convex, non-concave function 5 w þ x þ y þ z 6 0 ð46Þ uðq; vÞ of the water flow q and the water volume v in the reservoir. 3 In Borghetti et al. (2008) a specific approach for the piecewise ðw; x; y; zÞ2½0; 13; ð47Þ approximation of functions of two variables f ðx1; x2Þ has been pro- posed. The method, called Improved Rectangle (IR) modeling, was whose optimal solution is w ¼ 0:0623355; x ¼ 0:362335, later analyzed in detail in D’Ambrosio et al. (2010) and consists of a y ¼ 0:162335; z ¼ 0:462335 with objective function value ‘‘basic approximation’’ and of its quality improvement. The basic f ¼ 2:02484. approximation is obtained by sampling one of the two variables, 4 4 The cubic domain ½0; 1 is partitioned into ðm 1Þ hyperrect- say x2, for a fixed number of values, say ‘, and using a piecewise angular sub-domains linear approximation to linearize the ‘ resulting univariate func- tions. The approximation quality improvement is obtained by add- j 1 j j 1 j j 1 j rðj ; j ; j ; j Þ¼ 1 ; 1 2 ; 2 3 ; 3 ing a ‘‘correction factor’’ that takes into account the difference 1 2 3 4 m 1 m 1 m 1 m 1 m 1 m 1  between the real value of variable x2 and the two closest among j 1 j 4 ; 4 the ‘ sampling points. Although the IR modeling is not as general m 1 m 1 as the UJ and HR ones (for example, it does not – directly – extend to functions of more than two variables and does not admit a log- with j1; j2; j2; j3 ¼ 1; 2; ...; m 1, and m 2f3; 5; 9; 17; 33g. arithmic implementation), we include comparison with its perfor- In this case too, as m increases, standard implementations be- mance mainly because it has been developed and tailored for the come unmanageable. Hence, the tested models are the same as problem at hand. in the 3-variable case. The other approximations we consider are those proposed in The results, reported in Table 3, are consistent with those in Ta- this paper with no oversampling. bles 1 and 2, with the only exception of small values of m (3 and 5) In fact, though we performed some experiments with oversam- for which the approximated solutions are very far away from the pling and obtained non-negligible improvements in the quality of optimal one. This is due to the high dimensionality of the problem some of the recomputed solutions, the results are not conclusive whose geometry is badly captured by too coarse partitions. For and we decided to not report them in full detail. The reason for that higher values of m, the speed-up is significant, the quality im- is twofold. On the one side, the objective function to be approxi-

Please cite this article in press as: Rovatti, R., et al. Optimistic MILP modeling of non-linear optimization problems. European Journal of Operational Research (2014), http://dx.doi.org/10.1016/j.ejor.2014.03.020 R. Rovatti et al. / European Journal of Operational Research xxx (2014) xxx–xxx 11

Fig. 6. Pictorial view of the approximation effect.

Table 2 Problem (41)–(44): Sizes of the MILP models resulting from different approximations and oversampling, along with the resulting performance.

m Model Constr. Vars. Non-zero 0–1 Time (CPLEX Aain in solution time (HR over True value of Loss wrt true Vars. ticks) UJ) (%) solution maximum (%)

3 UJ-log 17 35 189 6 3:5 101 9:8415 101 82: Op HR -log 11 32 147 3 2:3 101 34: 9:8415 101 82: UJOp-log-ovr 17 45 289 6 1:06 1:6682 7:6 Op HR -log-ovr 11 42 217 3 5:1 101 52: 1:7942 0:0083

5 UJ-log 23 136 1154 9 8:18 1:4637 23: HROp-log 17 133 968 6 4:42 46: 1:6988 5:6 Op UJ -log-ovr 23 146 1284 9 2:4 101 1:7771 0:97 HROp-log-ovr 17 143 1068 6 2:291: 1:7878 0:37

9 UJ-log 29 743 8643 12 1:104 101 1:7908 0:2 HROp-log 23 740 7557 9 9:514: 1:7908 0:2 Op UJ -log-ovr 29 753 8803 12 3:539 101 1:7908 0:2 Op HR -log-ovr 23 750 7687 9 1:421 101 60: 1:7537 2:3

17 UJ-log 35 4930 72,116 15 1:2882 102 1:7908 0:2 Op HR -log 29 4927 64,766 12 1:2111 102 6: 1:7908 0:2 Op UJ -log-ovr 35 4940 72,306 15 1:6327 102 1:7908 0:2 Op HR -log-ovr 29 4937 64,926 12 1:2128 102 26: 1:7939 0:027

33 UJ-log 41 35,957 632,157 18 1:11 103 1:7908 0:2 Op HR -log 35 35,954 578,295 15 1:0848 103 2:31:7938 0:033 Op UJ -log-ovr 41 35,967 632,377 18 1:7954 103 1:7934 0:055 Op HR -log-ovr 35 35,964 578,485 15 9:828 102 45: 1:7944 0:000032

65 UJ-log 47 274,648 5,642,438 21 1:4683 104 1:7923 0:11 Op HR -log 41 274,645 5,230,592 18 7:2232 103 51: 1:7941 0:017 Op UJ -log-ovr 47 274,658 5,642,688 21 1:5978 104 1:7927 0:094 Op HR -log-ovr 41 274,655 5,230,812 18 9:192 103 42: 1:7943 0:0033

129 UJ-log 53 2,146,715 50,496,975 24 1:652 105 1:7943 0:0027 Op HR -log 47 2,146,712 47,277,129 21 9:0823 104 45: 1:7943 0:0027 Op UJ -log-ovr 53 2,146,725 50,497,255 24 1:4794 105 1:7943 0:0027 Op HR -log-ovr 47 2,146,722 47,277,379 21 1:2422 105 16: 1:7943 0:0024

mated, involved in a maximization problem, is non-concave and with oversampling to be neglected because of the optimistic atti- we have no indication of where the true optimal solution is. Thus, tude of the approximation. Indeed, an extra point in the interior the only way of performing the experiments is to oversample in of a region that tightly approximates the value of the function each region and this leads to huge MILPs even with only one extra could be ignored in the convex combination if the function is not sample point per region. Runs exceeding memory or time limits are locally coherent with the objective function of the MILP. thus too frequent to allow a significant comparison. Both these observations suggest that oversampling could be On the other hand, and more structurally, the non-concavity of better exploited in a dynamic algorithmic approach that refines the function sometimes causes the extra information associated the region of interest based on intermediate MILP problems

Please cite this article in press as: Rovatti, R., et al. Optimistic MILP modeling of non-linear optimization problems. European Journal of Operational Research (2014), http://dx.doi.org/10.1016/j.ejor.2014.03.020 12 R. Rovatti et al. / European Journal of Operational Research xxx (2014) xxx–xxx

Table 3 Problem (45)–(47): Sizes of the MILP models resulting from different approximations and oversampling, along with the resulting performance.

m Model Constr. Vars. Non-zero 0–1 Time (CPLEX Gain in solution time (HR over True value of Loss wrt true Vars. ticks) UJ) (%) solution maximum (%)

3 UJ-log 22 91 613 10 9:8 101 3:0824 101 560: Op HR -log 10 85 385 4 5:4 101 45: 1:4235 101 1300: Op UJ -log-ovr 22 101 733 10 2:74 4:0883 102 4900: Op HR -log-ovr 10 95 445 4 1:52 45: 4:0883 102 4900:

5 UJ-log 30 639 6577 14 4:343 101 1:184 71: Op HR -log 18 633 4765 8 2:385 101 45: 7:0602 101 190: Op UJ -log-ovr 30 649 6737 14 1:391 101 1:7748 14: Op HR -log-ovr 18 643 4865 8 1:085 101 22: 1:7748 14:

9 UJ-log x 38 6579 90,917 18 4:6079 102 1:7647 15: Op HR -log 26 6573 71,465 12 3:2853 102 29: 1:7647 15: Op UJ -log-ovr 38 6589 91,117 18 4:597 102 1:8893 7:2 Op HR -log-ovr 26 6583 71,605 12 3:8919 102 15: 1:8893 7:2

17 UJ-log 46 83,543 1,458,337 22 5:9076 103 1:9686 2:9 Op HR -log 34 83,537 1,208,629 16 5:1292 103 13: 1:9686 2:9 Op UJ -log-ovr 46 83,553 1,458,577 22 5:6261 103 1:998 1:3 Op HR -log-ovr 34 83,547 1,208,809 16 4:4175 103 21: 1:998 1:3

33 UJ-log 54 1,185,947 25,188,621 26 9:1902 104 2:0152 0:48 Op HR -log 42 1,185,941 21,634,113 20 7:8199 104 15: 2:0152 0:48 Op UJ -log-ovr 54 1,185,957 25,188,901 26 9:1633 104 2:0211 0:19 Op HR -log-ovr 42 1,185,951 21,634,333 20 6:7869 104 26: 2:0211 0:19

(possibly approximatively solved with light heuristic techniques). The results of the CPLEX computation are presented in Table 5, Such an approach is compatible with triangulation as well, where we report, for each value of m and for different time limits although increasing triangulation resolution entails a complete (5, 10, 60 CPU minutes on an Intel Core2 CPU 6600, 2.40 gigahertz, reformulation of the model where adding oversampling in an opti- 1.94 gigabyte of RAM running under Linux) (i) the solution value mistic model can be done by simple insertion of new variables in obtained at the time limit (or earlier, if CPLEX could prove optimal- existing equations. A compromise would be to use triangulation ity), recomputed using the original non-linear function; (ii) the ini- first and then hyper-rectangular oversampling, so as to locate re- tial and final percentage gaps; (iii) the CPU time in seconds (or gions in which the objective function is locally concave or convex ‘‘T.L.’’ when time limit occurred); and (iv) the number of branch- thus allowing a better approximation by means of cheap optimistic and-bound nodes. Note that, differently from the previous tables, oversampling. Developing such an approach is beyond the scope of we consider, in Table 5 and in the following ones, CPU times in sec- the current paper and could be the topic of future research. onds instead of CPLEX ticks. The reason is that ticks are especially For the computational evaluation we consider a specific Short- interesting when the computing times are small so as to be able to Term Hydro Scheduling instance with 168 time periods introduced appreciate the ‘‘granular’’ differences, while computing times in in D’Ambrosio et al. (2010). Table 5 are never negligible (often coinciding with the time limit), Table 4 reports, for each value of the sampling parameter thus providing a very neat measure of the performance of the dif- m 2f10; 20; 30; 40; 50g, the size of the three MILP models (HR, UJ ferent models. The results show that size matters. Indeed the MILP and IR) with the same notation as in the previous tables. The table resulting from the UJ model cannot be solved to optimality but for refers to the non-logarithmic implementation (which is the only the case m ¼ 10 with a time limit of 600 CPU seconds, while all one possible for the IR model). other runs reach the time limit without having proven optimality, Table 4 shows, for the HROp-std model, a huge reduction in all and, in most cases, without having found any feasible solution at indicators with respect to the UJ-std one: both the number of con- all. Both the HROp-std and the IR models behave quite well, the for- straints and the number of binary variables are one order of mer being always faster (sometimes much faster) than the latter. magnitude smaller, while the number of variables and non-zero This is shown not only by the shorter computing times but also entries is halved. With respect to the IR model, there is the same from the number of time limits (1 vs. 6 out of 11 runs). In addition, drop in terms of number of constraints, the number of binaries when both executions hit the time limit (m ¼ 30 with 300 CPU sec- and non-zero is equivalent, while the overall number of variables onds), the number of enumerated branch-and-bound nodes for the is one order of magnitude bigger. HROp-std model is roughly four times bigger than that for the IR

Table 4 Short-Term Hydro Scheduling: Sizes of three MILP models resulting from different approximations (non-logarithmic implementation).

m HROp-std UJ-std IR Constr. 0–1 Var. Var. Non-zero Constr. 0–1 Var. Var. Non-zero Constr. 0–1 Var. Var. Non-zero 10 5544 3528 20,999 130,867 18,816 34,104 51,575 229,483 25,482 3810 9467 121,365 20 8904 6888 74,759 493,747 69,216 134,904 202,775 925,003 82,539 7107 17,741 438,384 30 12,264 10,248 162,119 1,091,827 153,216 302,904 454,775 2,090,923 173,196 10,404 26,015 954,483 40 15,624 13,608 283,079 1,925,107 270,816 538,104 807,575 3,727,243 297,462 13,710 34,307 1,670,445 50 18,984 16,968 437,639 2,993,587 422,016 840,504 1,261,175 5,833,963 455,319 17,007 42,581 2,584,884

Please cite this article in press as: Rovatti, R., et al. Optimistic MILP modeling of non-linear optimization problems. European Journal of Operational Research (2014), http://dx.doi.org/10.1016/j.ejor.2014.03.020 R. Rovatti et al. / European Journal of Operational Research xxx (2014) xxx–xxx 13

Table 5 Short-Term Hydro Scheduling: Solving the non-logarithmic MILP problems.

m Time HROp-std UJ-std IR limit Solution Initial Final CPU # Solution Initial Final CPU # Solution Initial Final CPU # value %gap %gap time Nodes value %gap %gap time Nodes value %gap %gap time Nodes 10 300 31,576.30 1.28 – 9.61 658 31,576.30 1.49 0.22 T.L. 7,684 31,576.30 11.25 – 21.61 30,850 600 31,576.30 1.28 – 9.61 658 31,576.30 1.49 – 304.69 13,542 31,576.30 11.25 – 21.61 30,850 20 300 31,630.90 1.24 – 37.01 631 n/a n/a n/a T.L. 1,121 31,613.80 8.80 0.02 T.L. 13,557 600 31,630.90 1.24 – 37.01 631 31,555.10 1.40 0.60 T.L. 3,699 31,613.80 8.80 0.01 T.L. 29,978 3600 31,630.90 1.24 – 37.01 631 31,582.00 1.29 0.41 T.L. 35,382 31,613.80 8.80 – 801.99 33,833 30 300 31,633.40 1.23 0.02 T.L. 5,057 n/a n/a n/a T.L. 411 31,629.20 10.53 0.05 T.L. 1,330 600 31,633.40 1.23 – 320.28 5,356 n/a n/a n/a T.L. 1,285 31,630.50 10.52 0.01 T.L. 5,472 3600 31,633.40 1.23 – 320.28 5,356 31,475.10 1.79 0.84 T.L. 4,310 31,630.50 10.52 – 780.84 6,017 40 600 31,639.20 1.20 – 265.00 929 n/a n/a n/a T.L. 747 31,379.00 13.63 9.56 T.L. 997 3600 31,639.20 1.20 – 265.00 929 n/a n/a n/a T.L. 3,284 31,636.60 12.88 – 1,534.82 6,370 50 3600 31,639.50 1.26 – 697.48 1,473 n/a n/a n/a T.L. 1,697 31,639.50 11.47 0.50 T.L. 11,354

model. The initial percentage gap for the HROp-std model is slightly similar (but smaller for the HROp-log modeling), the HROp-log mod- better than that for the UJ-std model and much smaller than that el uses few less constraints, while the number of non-zero entries, for the IR one. All these indicators clearly show that, as expected, although comparable, shows a clear advantage for the HROp-log the MILPs associated with the HROp-std modeling are by far the model. This results in a very similar performance of the CPLEX - most competitive ones. ver on the two versions of the MILP as shown by Table 7. Coming to the quality of the approximation, in the unique case Finally, in Tables 8 and 9, we compare the MILPs obtained with in which all MILPs are solved to optimality (m ¼ 10 with 600 CPU the HROp-std modeling with those obtained with the UJ-log (Viel- seconds), the (recomputed) solution value is identical. In general, ma & Nemhauser, 2011). Table 8 shows that the size of the MILPs the (recomputed) solution value for the HROp-std model is better resulting from the former modeling are of course bigger than those than that for the IR model (when both MILPs are optimally solved). resulting from the latter, but not much bigger. In fact, the number Note that in this case it is not trivial to trace the improvement of of constraints and binary variables is considerably larger, but, on the approximation quality when m grows, as we do not know the other hand, the number of non-zero entries is much smaller. the true optimal value. The results in Table 9 show an acceptable difference also in terms Concerning the logarithmic implementation, Table 6 reports, of solution quality and CPU time for the solver. The entries in the with the usual format, the size of the HROp-log and UJ-log models. ‘‘% err’’ column give the error of the solution value with respect As expected, the difference in size between the two models col- to the value of the original non-linear function at the solution lapses: the number of variables (both binary and continuous) is point. The solution values for HROp-std are slightly better or iden-

Table 6 Short-Term Hydro Scheduling: sizes of two MILP models resulting from different approximations (logarithmic implementation).

m HROp-log UJ-log Constr. 0–1 Var. Var. Non-zero Constr. 0–1 Var. Var. Non-zero 9 3864 1512 15,791 134,395 4368 1848 16,127 142,963 17 4536 1848 51,071 552,043 5040 2184 51,407 578,419 33 5208 2184 185,807 2,407,771 5712 2520 186,143 2,501,683 65 5880 2520 712,991 10,698,571 6384 2856 713,327 11,056,243

Table 7 Short-Term Hydro Scheduling: solving the logarithmic MILP problems.

m HROp-log UJ-log Solution value Initial %gap Final %gap CPU time # Nodes Solution value Initial %gap Final %gap CPU time # Nodes 9 31,565.40 1.13 – 14.21 1,439 31,538.70 1.14 – 18.69 1,723 17 31,577.20 1.35 – 23.88 653 31,577.20 1.35 – 20.84 369 33 31,626.20 1.24 – 99.90 540 31,624.10 1.25 – 231.99 1,531 65 31,640.30 1.20 – 593.73 599 31,640.30 1.20 – 530.56 435

Table 8 Short-Term Hydro Scheduling: comparing sizes of MILPs associated with HROp-std modeling and UJ-log in logarithmic implementation.

m HROp-std UJ-log Constr. 0–1 Var. Var. Non-zero Constr. 0–1 Var. Var. Non-zero 9 5208 3192 17,471 107,515 4368 1848 16,127 142,963 17 7896 5880 55,103 360,187 5040 2184 51,407 578,419 33 13,272 11,256 194,879 1,317,115 5712 2520 186,143 2,501,683 65 24,024 22,008 732,479 5,037,307 6384 2856 713,327 11,056,243

Please cite this article in press as: Rovatti, R., et al. Optimistic MILP modeling of non-linear optimization problems. European Journal of Operational Research (2014), http://dx.doi.org/10.1016/j.ejor.2014.03.020 14 R. Rovatti et al. / European Journal of Operational Research xxx (2014) xxx–xxx

Table 9 Short-Term Hydro Scheduling: solving the MILPs associated with HROp-std and UJ-log.

m HROp-std UJ-log Solution value % Error CPU time # Nodes solution value % Error CPU time # Nodes

9 31,565.40 2.34 14.71 1507 31,538.70 2.26 18.69 1723 17 31,577.20 2.31 755.96 36,507 31,577.20 2.31 20.84 369 33 31,626.20 2.35 277.13 2567 31,624.10 2.35 231.99 1531 65 31,640.30 2.33 2,003.18 2088 31,640.30 2.34 530.56 435

tical, while the solution times are larger but comparable. This is a References significant advantage for the HR modeling because going logarith- mic is definitely not for free in terms of complexity of the Babayev, D. A. (1997). Piece-wise linear approximation of functions of two variables. Journal of Heuristics, 2, 313–320. implementation. Belotti, P., Kirches, C., Leyffer, S., Linderoth, J., Luedtke, J., & Mahajan, A. (2013). Mixed-integer nonlinear optimization. Acta Numerica, 22, 1–131. 5. Conclusions Borghetti, A., D’Ambrosio, C., Lodi, A., & Martello, S. (2008). An MILP approach for short-term hydro scheduling and unit commitment with head-dependent reservoir. IEEE Transactions on Power Systems, 23, 1115–1124. We have discussed a novel piecewise linear approximation of D’Ambrosio, C., Lodi, A., & Martello, S. (2010). Piecewise linear approximation of non-linear optimization problems that, with respect to classical tri- functions of two variables in MILP models. Operations Research Letters, 30, 39–46. angulations, guides the choice of the simplexes through one of two Irion, J., Lu, J.-C., Al-Khayyal, F., & Tsao, Y.-C. (2012). A piecewise linearization approximating objective functions. We have investigated theoreti- framework for retail shelf space management models. European Journal of cal properties of such functions and established upper bounds on Operational Research, 222, 122–136. Jensen, J. L. W. V. (1906). Sur les fonctions convexes et les inégalités entre les the error they can produce. Moreover, we have shown that the valeurs moyennes. Acta Mathematica, 30, 175–193. number of artificial binary variables required within an MILP mod- Kuhn, H. W. (1960). Some combinatorial lemmas in topology. IBM Journal of el is dramatically reduced with respect to the UJ triangulations. The Research, 4, 518–524. proposed approach allows the use of recent methods for represent- Lee, J., & Wilson, D. L. (2001). Polyhedral methods for piecewise-linear functions I: The lambda method. Discrete Applied Mathematics, 108, 269–285. ing such variables logarithmically. Computational experiments Rovatti, R., Borgatti, M., & Guerrieri, R. (1998). A geometric approach to maximum- highlighted the impact of the method when embedded in MILP speed n-dimensional continuous linear interpolation in rectangular grids. IEEE models. Transactions on Computers, 47, 894–899. Silva, T. L., & Camponogara, E. (2014). A computational analysis of multidimensional piecewise-linear models with applications to oil production optimization. Acknowledgment European Journal of Operational Research, 232, 630–642. Strang, G., & Fix, G. (1973). An analysis of the finite element method. Englewood Cliffs, NJ: Prentice Hall. This work was mostly developed when the second author was Tawarmalani, M., Richard, J.-P., & Xiong, C. (2013). Explicit convex and concave working at DEI, University of Bologna, whose support is gratefully envelopes through polyhedral subdivisions. Mathematical Programming, 138, acknowledged. The third and forth authors acknowledge the sup- 531–577. Todd, M. J. (1977). Union Jack triangulations. In S. Karamardian (Ed.), Fixed points: port of MIUR under the PRIN 2009 research program. We are in- Algorithms and applications (pp. 315–336). Academic Press. debted to two anonymous referees for a careful reading and Vielma, J. P., & Nemhauser, G. L. (2011). Modeling disjunctive constraints with a useful suggestions. logarithmic number of variables and constraints. Mathematical Programming, 128, 49–72.

Please cite this article in press as: Rovatti, R., et al. Optimistic MILP modeling of non-linear optimization problems. European Journal of Operational Research (2014), http://dx.doi.org/10.1016/j.ejor.2014.03.020 Computers & Operations Research 38 (2011) 505–513

Contents lists available at ScienceDirect

Computers & Operations Research

journal homepage: www.elsevier.com/locate/caor

Heuristic algorithms for the general nonlinear separable knapsack problem

Claudia D’Ambrosio, Silvano Martello Ã

DEIS, University of Bologna, Viale Risorgimento 2, 40136 Bologna, Italy article info abstract

Available online 3 August 2010 We consider the nonlinear knapsack problem with separable nonconvex functions. Depending on the Keywords: assumption on the integrality of the variables, this problem can be modeled as a nonlinear Nonlinear knapsack programming or as a (mixed) integer nonlinear programming problem. In both cases, this class of Nonconvexity problems is very difficult to solve, both from a theoretical and a practical viewpoint. We propose a fast Separable knapsack heuristic algorithm, and a local search post-optimization procedure. A series of computational Heuristic comparisons with a heuristic method for general nonconvex mixed integer nonlinear programming and Local search with global optimization methods shows that the proposed algorithms provide high-quality solutions Mixed integer nonlinear programming within very short computing times. & 2010 Elsevier Ltd. All rights reserved.

A D 1. Introduction xj integer 8j N N ð7Þ

The general nonlinear knapsack problem is where, as it is common in the knapsack literature, fj(xj) and gj(xj) are nonlinear nonnegative nondecreasing functions and the max f ðxÞð1Þ unique constraint is the capacity constraint (5). We have no

further assumption on functions fj(xj) and gj(xj), i.e., they can be gðxÞrc ð2Þ nonconvex and nonconcave. Cases in which lower bounds lj (different from zero) on the variable values are also given can be xAD ð3Þ handled by replacing, for jAN, xj with yj+lj, so (6) takes the form n 0 y u l . where x ¼ðx1; ...; xnÞAR , f(x) and g(x) are continuous differenti- r j r j j able functions, c is a nonnegative value, and DDRn. In this paper These problems classically arise, e.g., in resource-allocation we consider the case where both f(x) and g(x) are separable contexts, where a limited resource of amount c (such as an functions, and D includes bounds and integrality requirements on advertising budget) has to be partitioned for different categories (part of) the variables. The resulting problem can thus be jAN (such as advertisements for different products). The objective described, using the knapsack terminology, as follows. Given n function is the overall expected return, in terms of sales, from all categories. The nonconvexity arises because the return sharply items j, each having a profit function fj(xj) and a weight function increases with the amount of allocation when the advertisements gj(xj), associated with n variables xj limited by upper bounds uj are noticed by an increasing number of consumers which are more ðjAN ¼f1, ...,ngÞ, determine nonnegative xj values such that the total weight does not exceed a given capacity c and the total and more motivated to buy it. On the other hand, at some point, saturation occurs, and an increase of advertisement no longer leads produced profit is a maximum. A subset N of the xj variables can be restricted to take integer values. Formally, we consider the to a significant increase in sales. The situation is depicted in Fig. 1. nonlinear knapsack (NLK) problem If xj denotes the amount of advertisement allocated to category j X ðjANÞ, fj(xj) the expected return in terms of sales of category j,and max f ðx Þð4Þ j j gj(xj)¼xj the corresponding cost for advertising, the resulting j A N optimization problem is then modeled by NLK. X The well-known (linear) knapsack problem (see [10,9]) is the g ðx Þ c ð5Þ j j r special case of NLK arising when fj(xj) and gj(xj) are linear integer j A N functions, i.e., fj(xj)¼pjxj and gj(xj)¼wjxj 8jAN and N ¼ N.It follows that NLK is NPhard. 0 x u 8jAN ¼f1, ...,ngð6Þ r j r j Nonlinear knapsack problems have many applications in various fields such as, e.g., portfolio selection, stratified sampling, production planning, resource distribution. We refer the reader to à Corresponding author. E-mail addresses: [email protected] (C. D’Ambrosio), the book by Ibaraki and Katoh [6] and to the survey by Bretthauer [email protected] (S. Martello). and Shetty [2] for extensive reviews.

0305-0548/$ - see front matter & 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.cor.2010.07.010 506 C. D’Ambrosio, S. Martello / Computers & Operations Research 38 (2011) 505–513

and we can compute, for each item j, the maximum ratio 90 rj,mðjÞ ¼ max frjkg: ð10Þ 80 k ¼ 1,...,s 70 Assume by simplicity that the items are sorted according to nonincreasing r values, so, with respect to the current sampling 60 j,m(j) step, r1,m(1) is the best available ratio and mð1Þd1 is the value of 50 variable x1 that produces the best filling of the first g1,m(1) capacity f (x) 40 units. Consider now the second best item 2, and its best ratio r2,m(2) ðrr1,mð1ÞÞ, and observe that item 1 could have a better ratio than 30 item 2 also for a higher x1 value. Specifically, we can find the highest u u 20 m ð1Þ value (with m ð1ÞAfmð1Þ, ...,sg)suchthatr1,muð1Þ Zr2,mð2Þ. u The idea is then to define x1 ¼ m ð1Þd1, i.e., to use item 1 for the 10 first g1,muð1Þ capacity units, then to continue with the residual 0 capacity c ¼ cg1,muð1Þ. At the second iteration, item 2 (second best 0102030405060708090 available ratio) is used for the next g2,muð2Þ capacity units, where x muð2Þ is analogously defined as the highest value such that

r2,muð2Þ Zr3,mð3Þ, and so on. Fig. 1. Example of profit function. The method is further improved by refining the search for the u u best value m ðjÞ of the current xj as follows. Once m ðjÞ has been A number of nonlinear separable knapsack problems has been computed, the sampling step is decreased, say, to dj ¼ dj=s (where considered in the literature. In most cases the proposed algorithms s is a prefixed refinement parameter), and new profit-to-weight are specifically tailored to the case of convex or concave functions u ratios of item j are computed, for xj ¼ m ðjÞdj þdjs ðs ¼ 1,2, ...,sÞ, (see, e.g., the recent article by Zhang and Hua [12], who consider a and compared with rj+1,m(j+1), to obtain a more precise xj value. minimization case in which both the fj(xj)andthegj(xj) functions are Such refining process can be iterated by further decreasing the convex). In our model neither convexity nor concavity is assumed, a sampling step. situation rarely treated in the literature (apart from contributions on Note that, whenever an xj is defined, the ratios rl,m(l) of the the general global optimization, see Horst and Tuy [5]). unscanned items need to be updated, if the resulting residual A special case, in which N ¼ N and the xj variables can only take a capacity is insufficient for their best ratio, i.e., if gl,mðlÞ 4c. In such limited set of prefixed feasible values, was considered by Ohtagaki a case the first and second best available ratios can change. In et al. [11], who developed dominance criteria aimed at reducing the addition, by assuming that the current best ratios correspond to number of feasible values, and a specifically tailored greedy items j and j+1, when looking for the muðjÞ value, the residual algorithm that recursively allocates capacity units to nondominated capacity has to be taken into account, i.e., muðjÞ must be defined as items and updates the set of dominated items. Another special case, the highest value such that rj,muðjÞ Zrj þ 1,mðj þ 1Þ and gj,muðjÞ rc. in which N ¼ N,thefj(xj) functions are piecewise linear and the gj(xj) The algorithm performs a final step when one of the two cases functions are linear, was recently considered by Kameshwaran and occurs: Narahari [8], who developed polynomial and pseudo-polynomial time approximation algorithms. In Section 2 we present a constructive heuristic algorithm for 1. all items but one have been assigned an xj value: in this case all NLK based on a discretization of the solution space. In Section 3 we the residual capacity is used for the last item, i.e., we find the describe post-processing improvement heuristics based on local largest feasible xn such that gn(xn) does not exceed the residual search procedures that operate on pairs of variables. The resulting capacity; approach is experimentally compared with open source nonlinear 2. the residual capacity is insufficient for allocating the minimum programming solvers [1,7] which provide heuristic solutions to sampled weight of any unscanned item, i.e., c ogj1 for each nonconvex problems. The quality of the solutions is experimentally unscanned j: in this case the best between two options is evaluated by comparison with the upper bounds produced by adopted, namely (let j be the last scanned item): general global optimization solvers (Couenne [3] and SC-MINLP, (a) the whole residual capacity is used for the first unscanned developed by D’Ambrosio et al. [4]). The computational results, item, i.e., we find the largest feasible xj+1 such that reported in Section 4, show that the proposed approach provides gj+1(xj+1) does not exceed the residual capacity; high-quality solutions within very small CPU times comparing to (b) xj is increased as much as possible, i.e., we find the largest the other solvers used for the comparisons, even for large-size feasible xj such that gj(xj) does not exceed the residual instances of the problem. Conclusions follow in Section 5. capacity. If some capacity still remains, this is used for the first unscanned item, i.e., we find the largest feasible xj+1 such that gj+1(xj+1) does not exceed the new residual capacity. 2. Constructive heuristic In all cases above, some residual capacity can remain, if the Before giving a detailed statement of the heuristic, we describe updating is limited by the upper bounds. In this case a simple its main steps. The algorithm starts by computing profit and greedy search scans the items of N in any order, and, for each item weight values over a discretized solution space. Let s denote the j, increases the value of xj as much as possible. Note that, to satisfy the integrality requirements for items in N, number of samplings (identical for all functions), so dj ¼ uj=s is the sampling step, and define it is sufficient to only consider integer valued sampling steps dj for all jAN. A fjk ¼ fjðkdjÞ and gjk ¼ gjðkdjÞðj N; k ¼ 1, ...,sÞ: ð8Þ The detailed statement of the method is given in Algorithm 1. We then obtain the profit-to-weight ratios as As already observed, whenever a new xj is defined, the ratios rl,m(l) of the unscanned items need in general to be updated and re- fjk sorted. For this reason, in order to achieve a better efficiency, the rjk ¼ ðjAN; k ¼ 1, ...,sÞð9Þ gjk items are not actually sorted at each iteration, but only the two C. D’Ambrosio, S. Martello / Computers & Operations Research 38 (2011) 505–513 507

~ items a and b providing the maximum and second maximum 44: for each jAN do xj :¼ 0; z :¼ zþfjð0Þ; ratios, ra,mðaÞ and rb,mðbÞ, are identified in linear time. 45: while c 40 do {comment: final greedy search to complete the solution} Algorithm 1. Heurðs,n_ref ,sÞ. 46: let j be the next item in the original set N;

47: if gjs rc þgjðxjÞ then dj :¼ ujxj; 1: d :¼ u =s for all jAN; d :¼ maxð1,bd cÞ for all jAN; j j j j 48: else 2: for each jAN and kAf1, ...,sg do compute rjk according to 49: find dj such that gjðxj þdjÞ¼c þgjðxjÞ (and set dj :¼bdjc (9); if jAN); 3: for each jAN do compute m(j) according to (10); 50: end if; 4: c :¼ c; N~ :¼ N {comment: unscanned items}; z :¼ 0 51: z :¼ zþðf ðx þd Þf ðx ÞÞ;c :¼ cðg ðx þd Þg ðx ÞÞ; {comment: incumbent solution value}; j j j j j j j j j j x :¼ x þd ; 5: while c 40 do j j j 52: end while; 6: find the best item a, i.e., ra,mðaÞ ¼ max ~ fr ð Þg; set j A N j,m j 53: return x. xa :¼ damðaÞ; N~ :¼ N~ \fag; ~ 7: update rjk and m(j) for all jAN by considering cga,mðaÞ as the current capacity; At each iteration of the main while loop of statement 5 a new ~ ~ 8: if gj1 4cga,mðaÞ for all jAN then mðjÞ :¼ 1 for all jAN; xj variable is defined, hence the loop is executed at most n times. 9: find the next best item b, i.e., rb,mðbÞ ¼ maxj A N~ frj,mðjÞg; Each iteration requires to find the maximum and second 10: ref :¼ 0; r :¼ ra,mðaÞþ1; w :¼ ga,mðaÞþ1; maximum ratios, which takes O(n) time. If the parameters used 11: while ref rn_ref do {comment: main loop} for the refinements ðs,nref and sÞ are constants, as it happens in 12: while ðr Zrb,mðbÞÞ&&ðxa þda ruaÞ&&ðw rcÞ do practical implementations (see Section 4), the time complexity of 2 13: xa :¼ xa þda; w :¼ gaðxa þdaÞ; r :¼ faðxa þdaÞ=w; Heur is O(n ). (We are assuming here that the computation of the 14: end while; zero of a weight function (see statements 23, 28, 31, 33 and 49) 15: if xa ¼ ua then break; takes a CPU time bounded by a constant independent of the input 16: else size). 17: ref¼ref+1; da ¼ da=s (and set da :¼bdac if aAN); 18: if da ¼ 0 then break else w :¼ gaðxa þdaÞ; 3. Local search r :¼ faðxa þdaÞ=w; 19: end if; 20: end while; As will be seen in Section 4, Algorithm Heur produces, on

21: z :¼ zþfaðxaÞ; c :¼ cgaðxaÞ; update rjk and m(j) for all average, solutions of good quality within short CPU times. jAN~ ; Nevertheless, it is worth to also use a post-processing swap local search algorithm to improve them. In this case too we first give 22: if jN~ j¼1 (say, N~ ¼fjg) then {comment: final step, case the general idea, and then a detailed algorithmic statement. (1)} Let xi and xj be the current values of variables xi and xj, and let 23: if g rc then x :¼ u ;else find xj such that g ðx Þ¼c; js j j j j c denote the current residual capacity. Consider a new sampling 24: c :¼ cgjðxjÞ; z :¼ zþfjðxjÞ; step e, smaller than the sampling steps di and dj used in Algorithm 25: break; Heur, and define the two potential modifications: 26: end if; ~ 27: if gj1 4c for all jAN then {comment: final step, case (A) ðxi :¼ xi þe and xj :¼ xjeÞ; (2)} (B) ðxi :¼ xie and xj :¼ xj þeÞ. 28: find db such that gbðdbÞ¼c (and set db :¼bdbc if bAN); N~ :¼ N~ \fbg; Assume by simplicity that such modifications are feasible, i.e., 29: if gas rc þgaðxaÞ then da :¼ uaxa; that 30: else 31: find da such that gaðxa þdaÞ¼c þgaðxaÞ (and set xi þerui,xj þeruj,xieZ0,xjeZ0; A da :¼bdac if a N); ðgiðxi þeÞgiðxiÞÞþðgjðxjeÞgjðxjÞÞrc, and 32: end if; ðgiðxieÞgiðxiÞÞþðgjðxj þeÞgjðxjÞÞrc. 33: u u if gaðxa þdaÞ¼c then db :¼ 0; else find db such that u Now correspondingly define the two potential objective function gbðd Þ¼cðgaðxa þdaÞgaðxaÞÞ; b improvements 34: u u if bAN then db :¼bdbc; u 35: if f ðx þd Þþf ðd Þ f ðx Þþf ðd Þ then {comment: a a a b b o a a b b DðAÞ¼ðfiðxi þeÞfiðxiÞÞþðfjðxjeÞfjðxjÞÞ; final step, option (2)(a)} DðBÞ¼ðfiðxieÞfiðxiÞÞþðfjðxj þeÞfjðxjÞÞ, 36: z :¼ zþfbðdbÞ; c :¼ cgbðdbÞ; xb :¼ db; 37: else{comment: final step, option (2)(b)} and let D ¼ maxðDðAÞ,DðBÞÞ.IfD40 then the corresponding 38: u modification ((A) or (B)) improves the current solution. (The case z :¼ zþðfaðxa þdaÞfaðxaÞÞþfbðdbÞ; of an unfeasible modification XAfA,Bg is handled by setting 39: c :¼ cðg ðx þd Þg ðx ÞÞg ðdu Þ; x :¼ x þd ; a a a a a b b a a a DðXÞ¼0.) u xb :¼ db; Assume, without loss of generality, that the (A) modification

40: end if; above has been accepted. It is then possible to further increase xi 41: break; by e and decrease xj by e, continuing while DðAÞ40 and the 42: end if; modification remains feasible. When the next modification is not 43: end while; acceptable, the process can be iterated with a new pair of items. 508 C. D’Ambrosio, S. Martello / Computers & Operations Research 38 (2011) 505–513

The detailed statement of the local search procedure is [100,0]. The upper bounds uj on the xj values were always set to provided in Algorithm 2. The input parameter d provides the 100. fraction of the sampling step dj ðjANÞ to be used within the local Different weight functions were used for different tests. A first search for item j. The selection of the next item pair to be test bed (Tables 1–4) was obtained by generating the weights considered (step 2) was implemented in different ways: according to the concave increasing functions pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffi gjðxjÞ¼ pjxj þqj qj, ð12Þ in a deterministic implementation, denoted as LS_D in the following, the items were considered by increasing i¼1, y, obtaining different actual functions, for each item j, by randomly n1 and j¼i+1,y,n; generating pj and qj in the interval [1,20]. In orderP to produce ¼ 1 ð Þ in a randomized implementation, denoted as LS_R in the challenging instances, the capacity was set to c 2 j A Ngj uj (a following, the item pairs were uniformly randomly generated. commonly used value in the knapsack literature). A second test bed, used for Tables 5–8, was produced using A third possible implementation was attempted too, in which, at linear nondecreasing weight functions each iteration, all item pairs were tested and the modification gjðxjÞ¼wjxj ð13Þ producing the best improvement accepted. This version however by generating the w values uniformly randomly in the interval computationally proved to require consistently higher computing j [1,100]. Similarly to the previous case, the capacity was set to times, with very little advantage in terms of improvement. P c ¼ 1 w u . By assuming again that the computation of the zero of a 2 j A N j j function (see statements 4, 6, 7 and 9) can be done in constant For the third test bed (Tables 9–12) we used the special case of time, the local search clearly takes O(n2) time. (13) produced by unit weights, i.e., gj(xj) ¼ xj, with capacity c randomly generated in the interval [100,100n]. Algorithm 2. LSðdÞ. As a last test bed (Tables 13–16) we adopted the data generation used in Zhang and Hua [12], consisting of convex 1: repeat fj(xj) and gj(xj) functions, that were modified, as shown in Section 2: let (i, j) be the next pair of items to be considered; 1, so as to include the lower bounds on the variables given in [12], 3: repeat obtaining A A 4: find e such that giðxi þe Þ¼c þgjðxjÞþgiðxiÞ; i i 2 5: A A fjðxjÞ¼ajððxj þljÞdjÞ and gjðxjÞ¼cjðxj þljÞ, ð14Þ ei :¼ minðei ,di=d,xj,uixiÞ; 6: A with a and c uniformly random in [1,2], l uniformly random in find ej such that j j j A A [4,6], dj uniformly random in [4,lj], upper bounds on the xj values gjðxjej Þ¼c þgjðxjÞþgiðxiÞgiðxi þei Þ; uniformly random in [9 lj,11 lj], and capacity set to c ¼ 7: B B P P find ej such that gjðxj þej Þ¼c þgjðxjÞþgiðxiÞ; ð j A Ncj j A NdjÞ=n. (Additionally, the dj values were modified in 8: B B ej :¼ minðej ,dj=d,xi,ujxjÞ; order to produce nondecreasing fj(xj) functions.) A 9: find eB such that For each test bed, we considered values n f10,20,50,100,200, i 500,1000,2500g. For each value of n, two sets of 20 instances each g ðx eBÞ¼c þg ðx Þþg ðx Þg ðx þeBÞ; i i i j j i i j j j were generated, one for the case N~ ¼ | (real variables), and one for 10: A A DðAÞ :¼ðfiðxi þei ÞfiðxiÞÞþðfjðxjej ÞfjðxjÞÞ; the case N~ ¼ N (integer variables). The total number of tested 11: B B DðBÞ :¼ðfiðxiei ÞfiðxiÞÞþðfjðxj þej ÞfjðxjÞÞ; instances was thus 1280. All test instances and solutions are 12: if maxðDðAÞ,DðBÞÞ40 then available on the web at http://www.or.deis.unibo.it/research_ 13: if DðAÞ4DðBÞ then pages/ORinstances/NLKP.html. A A The proposed algorithms were compared with a number of 14: xi :¼ xi þe ; xj :¼ xje ; i j solvers, namely 15: else 16: x :¼ x eB; x :¼ x þeB i i i j j j heuristic solvers: Ipopt [7] for real instances, Bonmin [1] for 17: end if; integer instances; 18: end if; exact solvers: Couenne [3] and SC-MINLP [4], both for real and 19: until maxðDðAÞ,DðBÞÞr0; integer instances. 20: until all pairs of items have been considered; 21: return x. All the experiments were performed on an Intel Core 2, CPU 6600, 2.4 GHz, 1.94 GB ram, using only one processor. The exact solver SC-MINLP turned out to always dominate Couenne, so we do not report the results for the latter. SC-MINLP was run twice, 4. Computational experiments with 5 min and 1 h CPU time limit per instance, respectively. The heuristic solvers Ipopt and Bonmin were run with a time limit of The algorithms introduced in the previous sections were one CPU hour per instance. The same time limit was used for experimentally evaluated on different sets of randomly generated Heur, which however never exceeded 1 min CPU time. For each functions. execution of the local search algorithms LS_D and LS_R, a time Most of the computational tests (Tables 1–12) were performed limit of 5 CPU seconds was imposed. by generating the profits according to the function Concerning the heuristic solvers, it must be noted that Ipopt

ajðxj þ djÞ and Bonmin are exact solvers for convex nonlinear programs and fjðxjÞ¼cj=ð1þbjexp Þ, ð11Þ convex mixed-integer nonlinear programs, respectively, but they taken from [4], were it was introduced to simulate the resource- can be used as heuristics for nonconvex problems. In such a case, allocation problem described in Section 1. A typical shape of such they can produce different values depending on the (provided) function has been shown in Fig. 1. Different actual functions were starting point. In order to better analyze their behavior, they were obtained, for each item j, by randomly generating aj in the interval executed both with a single random starting point, and with 10 [0.1,0.2], bj and cj in the interval [0,100], and dj in the interval random starting points (taking the best solution). Concerning the C. D’Ambrosio, S. Martello / Computers & Operations Research 38 (2011) 505–513 509

Table 1 Nonlinear weights, real variables.

n Heur LS_D LS_R Ipopt_1 Ipopt_10 SCð5uÞ SC(1h) SC_UB

10 297.81 300.76 300.77 262.41 298.20 309.44 309.44 309.44 20 595.59 600.78 600.30 482.12 556.27 606.73 606.73 606.73 50 1600.36 1608.99 1609.11 1298.03 1398.54 1612.02 1612.06 1625.10 100 3248.27 3262.23 3262.60 2540.13 2696.78 3261.79 3262.53 3323.66 200 6564.74 6586.50 6586.88 5091.35 5317.59 6452.68 6580.43 6776.05 500 16,386.61 16,443.68 16,443.81 12,586.85 13,013.53 13,307.12 16,070.09 16,978.17 1000 32,941.26 33,066.03 33,065.84 25,334.00 25,978.39 n/a 32,266.69 34,194.21 2500 81,742.56 82,099.94 82,099.55 63,310.08 64,128.79 n/a n/a n/a

Average solution values over 20 instances. The best results are underlined.

Table 2 Nonlinear weights, real variables.

n Heur LS_D LS_R Ipopt_1 Ipopt_10 SCð5uÞ SC(1h)

10 0.0015 0.0270 0.0260 0.0500 0.2500 45.3000 45.3000 20 0.0040 0.0755 0.0750 0.0500 0.4500 194.7000 548.5500 50 0.0180 0.4220 0.4300 0.0500 1.0000 303.0500 3558.4500 100 0.0685 1.6450 1.7240 0.2000 2.4500 303.7000 3625.1000 200 0.2655 6.4200 6.7460 0.5500 6.2500 433.7500 3628.9000 500 1.6510 39.9220 45.6275 2.4000 27.2500 315.6500 3674.7000 1000 6.6380 156.4915 160.3200 8.2000 91.5500 n/a 3678.8000 2500 43.4220 160.3200 160.3200 41.2500 439.9500 n/a n/a

Average CPU seconds over 20 instances.

Table 3 Nonlinear weights, integer variables.

n Heur LS_D LS_R Bonmin_1 Bonmin_10 SCð5uÞ SC(1h) SC_UB

10 297.31 300.56 301.38 260.48 301.10 309.12 309.12 309.12 20 595.19 600.58 600.29 504.18 555.00 606.21 605.97 607.98 50 1599.67 1608.67 1608.49 1341.10 1463.11 1580.92 1592.14 1657.10 100 3246.83 3261.46 3261.79 2743.76 2935.19 2585.95 3203.35 3348.79 200 6561.43 6584.83 6585.72 1667.13 (14) 1766.79 (14) n/a n/a n/a 500 16,378.68 16,438.61 16,440.14 0.00 (20) 0.00 (20) n/a n/a n/a 1000 32,921.17 33,053.06 33,054.79 0.00 (20) 0.00 (20) n/a n/a n/a 2500 81,689.36 82,067.41 82,069.32 0.00 (20) 0.00 (20) n/a n/a n/a

Average solution values over 20 instances (# no solution). The best results are underlined.

Table 4 Nonlinear weights, integer variables.

n Heur LS_D LS_R Bonmin_1 Bonmin_10 SCð5uÞ SC(1h)

10 0.0020 0.0225 0.0230 0.7500 0.9000 141.9000 383.4500 20 0.0045 0.0660 0.0690 6.8000 2.2000 316.4500 2783.8000 50 0.0180 0.4025 0.4030 30.3500 29.3500 346.0000 3731.8500 100 0.0705 1.5820 1.6320 380.8500 423.0500 495.9000 3819.8000 200 0.2685 6.3140 6.7145 444.3500 490.1500 n/a n/a 500 1.6525 40.5555 45.3555 71.7000 73.0000 n/a n/a 1000 6.6595 155.2590 160.3200 108.8000 108.4000 n/a n/a 2500 43.7515 160.3200 160.3200 307.7000 310.7500 n/a n/a

Average CPU seconds over 20 instances.

exact solvers, as mentioned above, in all the experiments Couenne s Af1,10,50,100g and pairs ðn_ref ,sÞAfð0,Þ,ð1,2Þ,ð1,5Þ,ð1,10Þ,ð2, was largely dominated by SC-MINLP, and could never solve to 2Þ,ð3,2Þ,ð4,2Þ,ð5,2Þg, for a total of 32 configurations. The value optimality instances with n410 within one CPU hour. of parameter d was always set to 2. Preliminary experi- The proposed heuristics can produce different solutions ments showed that none of the configurations dominates the according to the values of the input parameters s,n_ref ,s and d others. Hence all were run, and the best solution was selected. (see Algorithms 1 and 2). For each instance, they were run several The CPU times reported in the tables give the cumulative time times for different input configurations (with 5 s CPU time limit spent. per execution for the local search). On the basis of preliminary When the computation of the zero of a weight function was computational experiments, the attempted values were needed, this was obviously done with a single statement for the cases 510 C. D’Ambrosio, S. Martello / Computers & Operations Research 38 (2011) 505–513

Table 5 Linear weights, real variables.

n Heur LS_D LS_R Ipopt_1 Ipopt_10 SCð5uÞ SC(1h) SC_UB

10 280.08 286.06 285.87 255.88 282.88 289.12 289.12 289.12 20 648.32 653.44 653.51 565.48 620.41 658.80 658.80 658.84 50 1676.62 1690.02 1690.91 1429.68 1528.25 1691.98 1691.98 1692.00 100 3358.78 3388.42 3388.84 2780.77 2920.77 3392.23 3392.23 3392.23 200 6824.79 6906.06 6906.35 5725.01 5904.28 6925.55 6925.55 6925.81 500 16,931.91 17,188.00 17,183.49 14,136.49 14,511.03 n/a 17,275.50 17,291.81 1000 33,660.48 34,252.91 34,239.55 28,125.13 28,710.02 n/a 34,465.65 34,520.94 2500 83,513.24 85,234.88 85,167.75 69,784.68 71,151.71 n/a 69,927.58 86,000.95

Average solution values over 20 instances. The best results are underlined.

Table 6 Linear weights, real variables.

n Heur LS_D LS_R Ipopt_1 Ipopt_10 SCð5uÞ SC(1h)

10 0.0015 0.0035 0.0030 0.0000 0.2000 4.0526 4.0526 20 0.0040 0.0060 0.0080 0.0500 0.3000 2.7895 2.7895 50 0.0175 0.0355 0.0400 0.0500 0.6500 4.0000 4.0000 100 0.0685 0.1275 0.1735 0.1500 1.4000 16.2105 16.2105 200 0.2650 0.4855 0.8550 0.3000 3.1500 143.7000 401.6316 500 1.6700 2.9575 8.6600 1.4000 13.2000 n/a 3646.1579 1000 6.7900 11.8380 57.0825 4.4000 48.7500 n/a 3680.3684 2500 43.0715 72.5295 160.3200 30.1500 357.7000 n/a 4893.3000

Average CPU seconds over 20 instances.

Table 7 Linear weights, integer variables.

n Heur LS_D LS_R Bonmin_1 Bonmin_10 SCð5uÞ SC(1h) SC_UB

10 280.69 286.08 285.03 249.43 283.21 287.34 287.34 287.34 20 648.93 653.65 653.78 516.64 (2) 633.11 657.61 657.61 657.61 50 1676.66 1690.52 1690.95 1467.90 1544.50 1686.12 1688.91 1689.92 100 3360.28 3389.11 3389.14 2953.55 3060.09 2756.49 3271.28 3397.49 200 6822.76 6906.50 6907.85 5085.47 (3) 5637.72 (2) 2765.22 5270.64 6987.74 500 16,915.56 17,185.24 17,184.23 2272.28 (17) 2280.72 (17) n/a 5919.26 (4) 17,313.60 1000 33,617.79 34,243.26 34,246.25 0.00 (20) 1499.91 (19) n/a n/a n/a 2500 83,404.46 85,228.34 85,164.41 0.00 (20) 0.00 (20) n/a n/a n/a

Average solution values over 20 instances (# no solution). The best results are underlined.

Table 8 Linear weights, integer variables.

n Heur LS_D LS_R Bonmin_1 Bonmin_10 SCð5uÞ SC(1h)

10 0.0015 0.0040 0.0040 0.5000 0.6000 2.1500 2.1052 20 0.0035 0.0070 0.0100 1.3000 2.2500 29.2500 220.7894 50 0.0195 0.0355 0.0385 20.2500 8.1000 320.5000 1241.1000 100 0.0765 0.1390 0.1815 130.8000 142.2500 365.6000 3564.2500 200 0.2955 0.5240 0.9050 1341.5500 2408.8743 577.4500 3422.0500 500 1.8110 3.2235 8.9690 367.7000 438.9500 n/a 6214.4000 1000 7.3950 12.8190 58.4845 254.4500 269.8500 n/a n/a 2500 46.2975 79.8670 160.3200 410.0000 407.8000 n/a n/a

Average CPU seconds over 20 instances. of linear and unit weight functions. For the case of nonlinear weight LS_D, average solution value produced by Heur followed by LS functions, instead of evoking an external library (such as, e.g., Matlab), (deterministic implementation); we preferred to perform a binary search over the definition range. LS_R, average solution value produced by Heur followed by LS The impact on the overall CPU time was however negligible. (randomized implementation); Table 1 reports the results for profit functions (11) and general Ipopt_1, average solution value produced by Ipopt with a weight functions (12), with real variables xj. The entries give: single starting point; Ipopt_10, average solution value produced by Ipopt with 10 starting points; n, number of variables; SCð5uÞ, average solution value produced by SC-MINLP with Heur, average solution value produced by Heur; 5 min time limit; C. D’Ambrosio, S. Martello / Computers & Operations Research 38 (2011) 505–513 511

Table 9 Unit weights, real variables.

n Heur LS_D LS_R Ipopt_1 Ipopt_10 SCð5uÞ SC(1h) SC_UB

10 299.60 302.52 302.43 275.45 298.68 304.62 304.62 304.63 20 519.49 529.73 529.86 451.70 508.39 534.70 534.70 534.71 50 1438.33 1447.83 1448.42 1253.29 1325.99 1451.89 1451.89 1451.90 100 2904.03 2929.78 2929.78 2451.25 2581.27 2937.29 2937.52 2938.10 200 6297.27 6348.63 6348.86 4377.89 (3) 5274.10 (1) 6366.51 6366.82 6386.30 500 13,316.69 13,401.71 13,404.72 6924.83 (5) 7154.56 (5) 12,583.73 13,445.68 13,570.22 1000 31,739.35 32,130.16 32,108.33 20,084.01 (4) 22,219.97 (3) 18,511.94 (5) 32,299.59 32,466.28 2500 71,525.56 72,340.66 72,311.35 28,243.74 (8) 32,386.56 (7) n/a 36,357.00 (7) 73,498.30

Average solution values over 20 instances (# no solution). The best results are underlined.

Table 10 Unit weights, real variables.

n Heur LS_D LS_R Ipopt_1 Ipopt_10 SCð5uÞ SC(1h)

10 0.0080 0.0045 0.0025 0.0000 0.2000 1.4000 7.2670 20 0.0105 0.0075 0.0065 0.0500 0.3000 3.9500 14.8255 50 0.0375 0.0390 0.0405 0.1500 1.4500 8.6000 34.7540 100 0.0995 0.1265 0.1690 0.3000 2.7500 95.2500 577.1865 200 0.3385 0.4740 0.8430 1.3000 11.3500 167.8500 1318.5550 500 1.4910 2.7785 8.5770 4.0000 33.2000 269.3500 2724.4625 1000 6.4205 11.5460 58.3990 10.9000 90.9500 627.1500 3593.7900 2500 36.2335 68.7270 160.3200 36.0000 373.2000 n/a 6425.4425

Average CPU seconds over 20 instances.

Table 11 Unit weights, integer variables.

n Heur LS_D LS_R Bonmin_1 Bonmin_10 SCð5uÞ SC(1h) SC_UB

10 299.44 302.46 302.93 281.59 295.08 303.63 303.63 303.57 20 519.17 529.51 530.01 435.43 504.79 533.41 533.41 533.39 50 1437.72 1447.70 1448.46 1145.77 (1) 1372.34 1449.60 1450.69 1450.58 100 2902.26 2929.05 2929.64 2349.93 (1) 2644.59 2923.63 2936.38 2937.33 200 6293.27 6347.64 6347.37 2345.22 (9) 2952.13 (8) 1934.77 (8) 6938.31 6961.38 500 13,307.49 13,396.44 13,400.43 4122.25 (9) 4625.42 (9) n/a 13,441.66 13,570.34 1000 31,710.51 32,120.26 32,105.46 3241.72 (17) 4590.47 (16) n/a 13,706.55 (9) 32,466.35 2500 71,470.87 72,324.02 72,288.67 1332.48 (19) 6295.49 (16) n/a n/a n/a

Average solution values over 20 instances (# no solution). The best results are underlined.

Table 12 Unit weights, integer variables.

n Heur LS_D LS_R Bonmin_1 Bonmin_10 SCð5uÞ SC(1h)

10 0.0075 0.0040 0.0030 0.0590 0.0365 7.3800 7.3800 20 0.0110 0.0065 0.0085 0.3415 0.2310 16.2130 16.2130 50 0.0375 0.0355 0.0415 2.3626 0.9160 192.3500 60.0345 100 0.1050 0.1315 0.1830 17.9321 17.8650 364.7500 683.9875 200 0.3430 0.5100 0.8745 158.0251 1665.6129 593.7000 1430.9625 500 1.4950 3.0395 8.7595 745.9173 1815.7170 n/a 2884.9425 1000 6.4930 12.2700 57.8245 3483.2427 584.1623 n/a 4017.9365 2500 35.1820 75.4795 160.3200 1030.4300 1041.8526 n/a n/a

Average CPU seconds over 20 instances.

Table 13 Zhang–Hua instances, real variables.

n Heur LS_D LS_R Ipopt_1 Ipopt_10 SCð5uÞ SC(1h) SC_UB

10 418.12 418.90 418.90 412.45 418.39 422.29 422.29 422.29 20 873.44 874.71 874.55 859.42 875.51 878.00 878.00 878.00 50 2260.71 2261.98 2261.62 2239.98 2262.04 2265.16 2265.16 2265.16 100 4532.93 4533.66 4533.66 4282.69 4533.33 4537.93 4538.01 4538.01 200 9087.61 9088.27 9088.27 8579.05 9085.62 9092.39 9092.39 9092.39 500 22,742.08 22,742.14 22,742.14 21,420.84 22,715.37 22,745.86 22,745.86 22,745.86 1000 45,558.29 45,558.62 45,558.62 42,649.20 45,461.25 45,561.49 45,561.90 45,561.90 2500 113,788.34 113,789.23 113,789.23 112,133.03 113,462.50 113,792.70 113,792.70 113,792.70

Average solution values over 20 instances. The best results are underlined. 512 C. D’Ambrosio, S. Martello / Computers & Operations Research 38 (2011) 505–513

SC(1h), average solution value produced by SC-MINLP with 1 h Bonmin_10, average solution value produced by Bonmin with time limit; 10 starting points. SC_UB, average upper bound value produced by SC-MINLP with 1 h time limit. The results were even better for Heur. It always produced better average solutions than Bonmin (with a single exception for The proposed heuristics always produce a solution. For the other Bonmin_10, n¼10), with CPU times that are always one or two algorithms, the number of instances for which no solution was orders of magnitude smaller than the two solvers. Also note that produced (# no solution) is given in brackets, when greater than Bonmin has several failures for nZ200. Both SCð5uÞ and SC(1h) zero. The companion Table 2 gives the average CPU times for the produced slightly better solutions than Heur only for n 20, tested algorithms. Algorithm Heur always produced better r slightly worse solutions for 50 n 100, and no solution at all for average solutions than Ipopt (with a single exception for Ipopt_10, r r nZ200. The local search algorithms behave analogously to the n ¼ 10), with CPU times that are roughly of the same (or lower) case of real variables. order of magnitude with respect to Ipopt_1, and one order of u Tables 5 and 6 refer to profit functions (11) and linear weight magnitude smaller with respect to Ipopt_10. SCð5 Þ produced functions (13), with real variables x . The entries give the same better solutions than Heur (by about 2%) for nr100, worse j information as for Tables 1 and 2. Algorithm Heur always solutions (by about 2%) for 200rnr500, and no solution at all produced better average solutions than Ipopt (again with a single for nZ1000. SC(1h) produced better solutions than Heur (by exception for Ipopt_10, n ¼ 10). In this case too the CPU times of about 2%) for nr200, worse solutions (by about 2%) for Heur are similar or lower than those of Ipopt_1, and one order of 500rnr1000, and no solution at all for nZ2500. The computing magnitude smaller than those of Ipopt_10. SC-MINLP had here a time of SC-MINLP was between three and five orders of better behavior for what concerns the solution quality. SCð5uÞ magnitude higher than that of Heur. Note that some of the CPU produced better solutions than Heur (by about 2%) for n 200, times in the last two columns are larger than the time limit, due r but no solution at all for nZ500. SC(1h) produced better solutions to the way the run time check is implemented in the codes. than Heur (by about 2%) for n 1000, and considerably worse The local search algorithms improved the solutions produced r solutions (by about 15%) for n ¼ 2500. The computing time of by Heur by about 1%, which allowed the overall heuristic to be the SC-MINLP was however between three and five orders of magni- winner for all test beds with nZ200, with a tie for n ¼ 100. The tude higher than that of Heur. CPU times requested by the local search improvements were The local search algorithms proved to be slightly more effective however between one and two orders of magnitude higher than for the case of linear weight functions. They improved the solutions those of Heur, although still two–three orders of magnitude below produced by Heur by about 2%, with CPU times usually comparable SC-MINLP. There is no clear winner between LS_D and LS_R, both with those of Heur (but clearly higher for n ¼ 2500). for the solution quality and the CPU time. Tables 7 and 8 refer to the functions of Tables 3 and 4, but with Tables3and4givetheresultsforthesamefunctionswith integer variables. Heur always produced better average solutions integer variables. The information provided are the same as for than Bonmin (with the ‘‘usual’’ exception), with CPU times that Tables 1 and 2, but, instead of Ipopt_1 and Ipopt_10, we have here: are almost always three or four orders of magnitude smaller. Bonmin almost always failed for nZ500. SCð5uÞ and SC(1h) Bonmin_1, average solution value produced by Bonmin with a produced slightly better solutions than Heur for nr50, slightly single starting point; worse solutions for 100rnr200, and almost no solution for

Table 14 Table 16 Zhang–Hua instances, real variables. Zhang–Hua instances, integer variables.

n Heur LS_D LS_R Ipopt_1 Ipopt_10 SCð5uÞ SC(1h) n Heur LS_D LS_R Bonmin_1 Bonmin_10 SCð5uÞ SC(1h)

10 0.00 0.01 0.01 0.05 0.15 2.80 2.67 10 0.00 0.00 0.00 0.20 0.15 2.77 5.07 20 0.00 0.02 0.02 0.00 0.25 6.57 5.98 20 0.00 0.01 0.01 1.00 0.65 42.30 249.48 50 0.02 0.09 0.10 0.10 0.70 38.36 33.88 50 0.01 0.07 0.08 2.10 2.20 118.42 1021.82 100 0.07 0.32 0.39 0.20 1.80 111.53 156.94 100 0.02 0.30 0.35 20.85 19.60 229.56 2400.69 200 0.30 1.25 1.63 0.40 4.00 126.10 258.85 200 0.09 1.18 1.56 274.80 125.55 285.11 3028.68 500 1.85 7.67 13.37 1.10 11.10 169.44 180.67 500 0.50 7.44 13.35 203.50 31.20 354.47 2955.31 1000 7.50 30.39 76.88 2.85 28.00 276.68 287.57 1000 1.96 29.92 78.08 2560.10 924.15 609.53 3111.48 2500 51.94 160.32 160.32 10.25 92.90 878.70 932.91 2500 12.41 160.32 160.32 2033.45 940.20 1534.33 4446.80

Average CPU seconds over 20 instances. Average CPU seconds over 20 instances.

Table 15 Zhang–Hua instances, integer variables.

n Heur LS_D LS_R Bonmin_1 Bonmin_10 SCð5uÞ SC(1h) SC_UB

10 417.40 417.45 417.45 413.90 417.60 417.45 417.75 417.75 20 871.50 871.80 872.00 872.10 873.70 871.30 871.60 873.84 50 2259.45 2260.05 2260.25 2152.55 (1) 2261.90 2250.90 2256.65 2263.15 100 4531.95 4532.15 4531.95 4518.90 4533.15 4518.45 4523.20 4536.80 200 9085.35 9085.45 9085.45 8139.25 (2) 8153.10 (2) 9055.05 9056.90 9092.12 500 22,739.90 22,740.10 22,740.10 2249.30 (18) 2250.00 (18) 17,796.02 22,707.30 22,745.07 1000 45,556.85 45,557.05 45,557.05 22,399.15 (10) 22,880.25 (10) 31,185.59 45,526.45 45,560.17 2500 113,787.05 113,787.05 113,787.05 0.00 (20) 5611.25 (16) 43,683.35 89,220.06 113,791.80

Average solution values over 20 instances (# no solution). The best results are underlined. C. D’Ambrosio, S. Martello / Computers & Operations Research 38 (2011) 505–513 513 nZ500. The local search algorithms behave analogously to the the solvers and the proposed heuristics almost always find the case of real variables. Note that the value of a feasible solution optimal solution. can sometimes be slightly higher than the upper bound value (see, e.g., Table 7 for n ¼ 50 as well as some entries in Table 11), due to the standard precision and convergence tolerance of SC- 5. Conclusions MINLP. The local search algorithms improved the solutions by about We have considered a very difficult nonlinear knapsack 2%, which allowed the overall heuristic to beat SC-MINLP also for problem that nonlinear solvers are unable to handle for instances n ¼ 50. The CPU times of the local search algorithms were of the of realistic size. We have proposed a fast heuristic algorithm, same order of magnitude (or one higher) than those of Heur. The enriched by a local search post-optimization procedure. Extensive overall CPU times of the proposed algorithms remained however computational comparisons with exact solvers have shown that 4–5 orders of magnitude below those of SC-MINLP. the proposed approach provides quick solutions of unexpected Tables 9 and 10 give the results for profit functions (11) and quality, i.e., within one or two percent from the global optimum, with CPU times that are some orders of magnitude smaller. The unit weight functions, with real variables xj, providing the same information as the previous tables. The behavior of all algorithms results are very satisfactory for all considered cases: general was quite close to the one observed for the case of linear weight nonconvex nonconcave profits (with concave weights, linear functions (Tables 5 and 6). Analogous considerations hold for the weights and unit weights), and convex profits with linear weights, case of unit weight functions with integer variables (Tables 11 both for the case of real and integer variables. Although the and 12). solutions produced by the proposed heuristics are, for small-size Tables 13 and 14 give the results for the Zhang and Hua [12] instances, sometimes worse than those obtained by other solvers, data sets (convex profits and linear weights), with real variables they are obtained quickly, hence the heuristics constitute a good trade off between solution quality and computational effort. The xj. These instances turn out to be very easy, and SC-MINLP could always find the optimum within less than 1000 s on average. The simplicity of the proposed algorithms makes them promising solutions produced by Heur were within one percent from the candidates for the extension to other nonlinear problems. optimum for small-size instances, and very close to the optimum for larger instances. Heur (followed by local search) is slightly Acknowledgements worse than Ipopt_10 for small-size instances (with smaller CPU times), and slightly better for large-size instances (with larger Research supported by Ministero dell’Istruzione, dell’Universita CPU times). The improvements produced by local search were e della Ricerca (MIUR), Italy. We thank our student Michele always very small. Dinardo for the help given in the implementation of the algorithms. The Zhang and Hua [12] instances with integer variables (Tables 15 and 16) are much harder than their real counterpart. SC-MINLP could find the optimal solutions only for n ¼ 10. For References 20rnr100 the best solutions were produced by Bonmin_10. Heur (followed by local search) gave solutions very close to the [1] Bonmin. /http://projects.coin-or.org/BonminS. [2] Bretthauer KM, Shetty B. The nonlinear knapsack problem—algorithms and best values for nr100, and dominated the other codes for larger n applications. European Journal of Operational Research 2002;138:459–72. values, with CPU times never exceeding 3 min. [3] Couenne. /http://projects.coin-or.org/CouenneS. By observing the upper bound values in Columns SC_UB of all [4] D’Ambrosio C, Lee J, Wachter¨ A. A global-optimization algorithm for mixed- integer nonlinear programs having separable non-convexity. In: Fiat A, tables, one can additionally observe that the solution quality of Sanders P, editors. ESA 2009 (17th annual European symposium, Copenhagen, the heuristic solutions was always within one or two percent not Denmark, September 2009). Lecture notes in computer science, vol. 5757, only from that of the solutions produced by the exact solvers, but 2009. p. 107–18. [5] Horst R, Tuy H. Global optimization: deterministic approaches. Berlin, also from the global optimum. Germany: Springer; 1990. Additional experiments were performed in order to check if [6] Ibaraki T, Katoh N. Resource allocation problems. Cambridge, MA: MIT Press; the time limit fixed in the experimentations was not enough for 1988. / S the solvers. To this end we re-executed SC-MINLP on the first [7] Ipopt. http://projects.coin-or.org/Ipopt . [8] Kameshwaran S, Narahari Y. Nonconvex piecewise linear knapsack problems. three instances of Table 1, for the case n¼500, with one day CPU European Journal of Operational Research 2009;192:56–68. time limit. It turned out that even so the solver produced exactly [9] Kellerer H, Pferschy U, Pisinger D. Knapsack problems. Berlin, Germany: the same three solutions obtained within 1 h. Springer; 2004. [10] Martello S, Toth P. Knapsack problems: algorithms and computer imple- Another set of experiments was aimed at checking if, for the mentations. Chichester, New York: John Wiley & Sons; 1990. integer case, much smaller upper bounds on the variable values [11] Ohtagaki H, Nakagawa Y, Iwasaki A, Narihisa H. Smart greedy procedure for could produce a deterioration of the performance of the heuristic. solving a nonlinear knapsack class of reliability optimization problems. Mathematical and Computer Modelling 1995;22:261–72. By imposing, e.g., uj ¼ 2 for all items, so that the problem moves [12] Zhang B, Hua Z. A unified method for a class of convex separable nonlinear closer to the binary case, it turned out, on the contrary, that both knapsack problems. European Journal of Operational Research 2008;191:1–6.