Mathematical Programming Computation: a New MPS Journal William Cook, Georgia Institute of Technology Thorsten Koch, Konrad-Zuse-Zentrum Berlin September 28, 2008

78 o N O PTIMA Mathematical Programming Society Newsletter Novermber 2008

Mathematical Programming Computation: A New MPS Journal William Cook, Georgia Institute of Technology Thorsten Koch, Konrad-Zuse-Zentrum Berlin September 28, 2008

The Mathematical Programming Society will publish the new journal Mathematical Programming Computation (MPC) beginning in 2009. The journal is devoted to computational issues in mathematical programming, including innovative software, comparative tests, modeling environments, libraries of data, and/or applications. A main feature of the journal is the inclusion of accompanying software and data with submitted manuscripts. The journal’s review process includes the evaluation and testing of the accompanying software. Where possible, the review will aim for verification of reported computational results.

1 Background In January 2007, Martin Grötschel proposed that MPS consider the creation of a computationally-oriented journal. The proposal was described in an email to Rolf Möhring. The following quote from the email provides a good summary of the intention of the proposal.

They see a weakness in our journal landscape concerning information about good codes, the distribution of codes themselves, of data and data collections and everything that has to do with computational aspects of this kind.

Rolf Möhring formed a committee to explore the idea of a new journal, with members Robert Bixby, William Cook (Chair), Thorsten Koch, Sven Leyffer, David Shmoys, and Stephen Wright. Email discussions were carried out between April 2007 and June 2007, and a short report was sent to Rolf Möhring to wrap up the committee’s work. The consensus of the committee was 78 continues on page 7 Structural Convex Optimization 2 MPS Chair’s Column 6 Smooth Semidefinite Optimization 9 op t i m a 7 8 November 2008 page How to advance in Structural Convex Optimization Yurii Nesterov October, 2008 Abstract called oracle, and it is local, which means that it is not changing if the function is In this paper we are trying to analyze the modified far enough from the test point. common features of the recent advances This model of interaction between the in Structural Convex Optimization: optimization scheme and the problem data polynomial-time interior-point methods, is called the local Black Box. At the time of smoothing technique, minimization in its development, this concept fitted very relative scale, and minimization of composite well the existing computational practice, functions. where the interface between the general optimization packages and the problem Keywords convex optimization · non- data was established by Fortran subroutines smooth optimization · complexity theory created independently by the users. · black-box model · optimal methods Black-Box framework allows to speak · structural optimization · smoothing about the lower performance bounds technique. for diffierent problem classes in terms of informational complexity. That is the lower Mathematics Subject Classification (2000) estimate for the number of calls of oracle 90C06 · 90C22 · 90C25 · 90C60. which is necessary for any optimization method in order to guarantee delivering an Convex Optimization is one of the rare ε-solution to any problem from the problem fields of Numerical Analysis, which benefit class. In this performance measure we do from existence of well-developed complexity not include at all the complexity of auxiliary theories. In our domain, this theory was computations of the scheme. created in the middle of the seventies in Let us present these bounds for the most a series of papers by A.Nemirovsky and important classes of optimization problems D.Yudin (see [8] for full exposition). It posed in the form consists of three parts: – Classification and description of problem min f(x), (1) instances. x∈Q – Lower complexity bounds. – Optimal methods. n where Q ⊆ R is a bounded closed convex set (⎢⎢x⎜⎜ ≤ R, x ∈ Q), and function f is In [8], the complexity of a convex convex on Q. In the table below, the first optimization problem was linked with its column indicates the problem class, the level of smoothness introduced by Hölder second one gives an upper bound for conditions on the first derivatives of allowed number of calls of the oracle in functional components. It was assumed 1 the optimization scheme , and the last that the only information the optimization column gives the lower bound for analytical methods can learn about the particular complexity of the problem class, which problem instance is the values and derivatives depends on the absolute accuracy ε and the of these components at some test points. class parameters. This data can be reported by a special unit

This paper was written during the visit of the author at IFOR (ETH, Zurich). The author expresses his gratitude to the Scientific Director of this center Hans-Jacob Lüthi for his support and excellent working conditions.

Yurii Nesterov Center for Operations Research and Econometrics (CORE) Université catholique de Louvain (UCL) 34 voie du Roman Pays, 1348 Louvain-la-Neuve, Belgium. Tel.: +32-10-474348 Fax: +32-10-474301 e-mail: [email protected] op t i m a 7 8 November 2008 page

put them into the black box for numerical exclusion of variables. methods. That is the main conceptual Imagine for a moment that we do not contradiction of the standard Convex know how to solve the system of linear Optimization. equations. In order to discover the above Intuitively, we always hope that the scheme we should apply the following structure of the problem can be used for improving the performance of minimization Golden Rules schemes. Unfortunately, structure is a very fuzzy notion, which is quite difficult to 1. Find a class of problems which a (2) formalize. One possible way to describe can be solved very efficiently. It is important that these bounds are exact. the structure is to fix the analytical type 2. Describe the transformation This means that there exist methods, which of functional components. For example, rules for converting the initial have efficiency estimates on corresponding we can consider the problems with linear problem into desired form. problem classes proportional to the lower constraints only. It can help, but this 3. Describe the class of problems bounds. The correspondingoptimal methods approach is very fragile: If we add just a for which these transformation were developed in [8,9,19,22,23]. For single constraint of another type, then we rules are applicable. further references, we present a simplified get a new problem class, and all theory must a version of the optimal method [9] as applied be redone from scratch. In our example, it is the class 2 to the problem (1) with f ∈ C2: On the other hand, it is clear that having of linear systems with triangular the structure at hand we can play a lot with matrices. (4)

Choose a starting point y0 ∈ Q and set the analytical form of the problem.We can x−1 = y0. For k ≥ 0 iterate: rewrite the problem in many equivalent In Convex Optimization, these rules were settings using non-trivial transformations used already several times for breaking of variables or constraints, introducing down the limitations of Complexity Theory. additional variables, etc. However, this Historically, the first example of that would serve almost no purpose without type is the theory of polynomial-time (3) fixing a clear final goal. So, let us try to interior-point methods (IPM) based on self- understand what it could be. concordant barriers. In this framework, the As usual, it is better to look at classical class of easy problems is formed by problems As we see, the complexity of each iteration examples. In many situations the sequential of unconstrained minimization of self- of this scheme is comparable with that of the reformulations of the initial problem can concordant functions treated by the Newton simplest gradient method. However, the rate be seen as a part of numerical scheme. We method. This know-how is further used in of convergence of method (3) is much faster. start from a complicated problem P and, the framework of path-following schemes After a certain period of time, it became step by step, change its structure towards to for solving so-called standard minimization clear that, despite its mathematical the moment we get a trivial problem (or, a problems. Finally, it can be shown that by excellence, Complexity Theory of Convex problem which we know how to solve): a simple barrier calculus this approach can Optimization has a hidden drawback. P → … → (f *, x*). be extended onto all convex optimization Indeed, in order to apply convex A good example of such a strategy is the problems with known structure (see [11,18] optimization methods, we need to be standard approach for solving system of for details). The efficiency estimates of sure that functional components of our linear equations corresponding schemes are of the order problem are convex. However, we can check 1/2 υ Ax = b. O(υ ln –ε ) iterations of the Newton convexity only by analyzing the structure of 3 We can proceed as follows: method, where υ is the parameter of these functions: If our function is obtained 1. Check if A is symmetric and positive corresponding self-concordant barrier. from the basic convex functions by convex definite. Sometimes this is clear from the Note that for many important feasible operations (summation, maximum, etc.), we origin of the matrix. sets this parameter is smaller than the conclude that it is convex. If not, then we 2. Compute Cholesky factorization of this dimension of the space of variables. Hence, have to apply general optimization methods matrix: for the pure Black-Box schemes such an which usually do not have theoretical A = LLT ; efficiency is simply unreachable in view guarantees for the global performance. where L is a lower-triangular matrix. of the lower complexity bound for class Thus, the functional components of Form two auxiliary systems C3 (see (2)). It is interesting that formally the problem are not in the black box the Ly = b, LT x = y. the modern IPMs look very similar to the moment we check their convexity and 3. Solve these system by sequential usual Black-Box schemes (Newton method choose minimization scheme. However, we plus path-following approach), which 1 If this upper bound is smaller than O(n), then the dimension of the problem is really very big, and we were developed in the very early days of cannot afford the method to perform this amount of calls. Nonlinear Optimization [4]. However, this 2 In method (11)-(13) from [9], we can set ak = 1 + k/2 since in the proof we need only to ensure is just an illusion. For complexity analysis 2 2 ak+1 – ak ≤ ak+1. of polynomial-time IPM, it is crucial that 3 Numerical verification of convexity is an extremely difficult problem. op t i m a 7 8 November 2008 page

they employ the special barrier functions difficult. Usually we can work directly with which do not satisfy the local Black-Box the structure of our problem, at least in the assumptions (see [10] for discussion). cases when it is created by us. The second example of using the rules Thus, the basis of the smoothing technique Thus, up to a logarithmic factor, for (4) needs more explanations. By certain [12,13] is formed by two ingredients: obtaining the same accuracy, the methods circumstances, these results were discovered the above observation, and a trivial but based on smoothing technique need only with a delay of twenty years. Perhaps they systematic way for approximating a a square root of iterations of the usual were too simple. Or maybe they are in a nonsmooth function by a smooth one. subgradient scheme. Taking into account, seemingly very sharp contradiction with This can be done for convex functions that usually the subgradient methods are the rigorously proved lower bounds of represented explicitly in a max-form: allowed to run many thousands or even Complexity Theory. millions of iterations, the gain of the Anyway, now everything looks almost f(x) = max { 〈Ax − b, u 〉 − φ(u), smoothing technique in computational time 4 evident. Indeed, in accordance to Rule 1 u∈Qd can be enormously big. in (4), we need to find a class of very easy where Qd is a bounded and convex dual It is interesting, that for problem (6) the problems. And this class can be discovered feasible set and φ(u) is a concave function. computation of the smooth approximation is directly in Table (2)! To see that, let us Then, choosing a nonnegative strongly very cheap. Indeed, let us use for smoothing compare the complexity of the classes convex function d(u), we can define a the entropy function: C1 and C2 for the accuracy of 1% (ε = smooth function −2 10 ). Note that in this case, the accuracy- fμ(x) = max { 〈Ax − b, u 〉 − φ(u) − μ · d(u)}

dependent factors in the efficiency estimates u∈Qd (5) vary from ten to ten thousands. So, the natural question is: which approximates the initial objective. Then the smooth approximation (5) of the Can the easy problems from C2 help us Indeed, denoting Dd = max d(u), objective function in (6) has the following u∈Qd somehow in finding an approximate solution compact representation: to the difficult problems from C1? we get And the evident answer is: Yes, of course! f(x) ≥ fμ(x) ≥ f(x) μDd. It is a simple exercise in Calculus to show that we can always approximate a Lipschitz- At the same time, the gradient of function Thus, the complexity of the oracle forf (x) continuous nonsmooth convex function f is Lipschitz-continuous with Lipschitz μ and f (x) is similar. Note that again, as in on a bounded convex set with a uniform μ constant of the order of O (see [12]) for the polynomial-time IPM theory, we apply accuracy ε > 0 by a smooth convex function details). the standard oracle-based method ((3) in with Lipschitz-continuous gradient. We Thus, we can see that for an this case) to a function which does not pay for the accuracy of approximation by a implementable definition (5), we get a satisfy the Black-Box assumptions. large Lipschitz constant M for the gradient, possibility to solve problem (1) in O An inexplicable blindness to the which should be of the order O . Putting iterations of the fast gradient method possibility to reduce the complexity of this bound for M in the efficiency estimate (3). In order to see the magnitude of the nonsmooth optimization problems with improvement, let us look at the following of C2 in (2), we can see that in principle, it known structure is not restricted to the is possible to minimize nonsmooth convex example: smoothing technique only. As it was functions by the oracle-based gradient shown in [7], very similar results can be methods with analytical complexity O . obtained by the extra-gradient method by (6) But what about the Complexity Theory? It G. Korpelevich [6] using the fact that this n seems that it was proved that such efficiency where Δn ∈ R is a standard simplex. Then method is optimal for the class of variational is just impossible. the properly implemented smoothing inequalities with Lipschitz-continuous It is interesting that in fact we do not technique ensures the following rate of operator (for these problems it converges get any contradiction. Indeed, in order convergence: as O . Actually, in a verbal form, the to minimize a smooth approximation of optimality of the extra-gradient method nonsmooth function by an oracle-based was known already for a couple of decades. scheme, we need to change the initial oracle. If we apply to problem (6) the standard However, a rigorous proof of this important Therefore, from mathematical point of subgradient methods (e.g. [14]), we can fact and discussion of its consequences for view, we violate the Black-Box assumption. guarantee only Structural Nonsmooth Optimization was On the other hand, in the majority of published only in [7], after discovering the practical applications this change is not smoothing technique. To conclude this section, let us discuss 4 It is easy to see that the standard subgradient methods for nonsmooth convex minimization need indeed O the last example of acceleration strategies operations to converge. Consider a univariate function f(x) = ⏐x⏐, x ∈ R. Let us look at the subgradient process: in Structural Optimization. Consider the problem of minimizing the composite It easy to see that However, the step-size sequence is optimal [8]. op t i m a 7 8 November 2008 page

objective function: analysis. Indeed, many other schemes inequalities with Lipschitz continuous (7) have similar theoretical justifications and monotone operators and smooth convex it was not clear at all why these particular concave saddle point problems. SIAM where the function f is a convex suggestions deserve more attention. Journal on Optimization, 15, 229-251 diffierentiable function on domΨ with Moreover, even now, when we know that (2004). 8. A. Nemirovsky and D. Yudin. Problem Lipschitz-continuous gradient, and function the modified variants of some old methods Complexity and Method Efficiency in Ψ is an arbitrary closed convex function. give excellent complexity results, we cannot Optimization. Wiley, New-York, 1983. Since Ψ can be even discontinuous, in say too much about the theoretical efficiency 9. Yu. Nesterov. A method for unconstrained general this problem is very difficult. of the original schemes.(7) convex minimization problem with the However, if we assume that function Ψ Thus, we have seen that in Convex rate of convergence O . Doklady AN is simple, then the situation is changing. Optimization the complexity analysis SSSR (translated as Soviet Mathematics Indeed, suppose that for any y− ∈ dom Ψ plays an important role in selecting the Doklady), 269(3), 543-547 (1983). we are able to solve explicitly the following promising optimization methods among 10. Yu. Nesterov. Interior-point methods: auxiliary optimization problem: hundreds of others. Of course, it is based An old and new approach to nonlinear on investigation of the worst-case situation. progamming. Mathematical Programming, 79(1-3), 285-297 (1997). However, even this limited help is important 11. Yu. Nesterov. Introductory Lectures on for choosing the perspective directions for (8) Convex Optimization. Kluwer, Boston, further research. This is true especially (compare with (3)). Then it becomes possible 2004. now, when the development of Structural 12. Yu. Nesterov. Smooth minimization of to develop for problem (7) fast gradient Optimization makes the problem settings non-smooth functions. CORE Discussion methods (similar to (3)), which have the and corresponding efficiency estimates more Paper 2003/12 (2003). Published in rate of convergence of the order O and more interesting and diverse. Mathematical Programming, 103 (1), 127- (see [15] for details; similar technique was The size of this paper does not allow us to 152 (2005). developed in [3]). Note that the formulation discuss other interesting setting of Structural 13. Yu. Nesterov. Excessive gap technique in (7) can be also seen as a part of Structural Convex Optimization (e.g. optimization nonsmooth convex minimization. SIAM Optimization since we use the knowledge in relative scale [16, 17]). However, we Journal on Optimization, 16 (1), 235-249 (2005). of the structure of its objective function hope that even the presented examples can 14. Yu. Nesterov. Primal-dual subgradient help the reader to find new and interesting directly in the optimization methods. methods for convex problems. research directions in this promising field Mathematical Programming (2007) Conclusion (see, for example, [1,2,5]). 15. Yu. Nesterov. Gradient methods for minimizing composite objective function. In this paper, we have considered several References CORE Discussion Paper 2007/76, (2007). examples of significant acceleration of the 16. Yu. Nesterov. Rounding of convex sets usual oracle-based methods. Note that the 1. A. d'Aspremont, O. Banerjee, and L. El and efficient gradient methods for linear achieved progress is visible only because of Ghaoui. First-Order Methods for Sparse programming problems. Optimization the supporting complexity analysis. It is Covariance Selection. SIAM Journal on Methods and Software, 23(1), 109-135 interesting that all these methods have some Matrix Analysis and its Applications, 30(1), (2008). 56-66, (2008). prototypes proposed much earlier: 17. Yu. Nesterov. Unconstrained convex 2. O. Banerjee, L. El Ghaoui, and A. − Optimal method (3) is very similar to minimization in relative scale. Accepted by d'Aspremont. Model Selection Through Mathematics of Operations Research. the heavy point method: Sparse Maximum Likelihood Estimation. x = x − α ∇ f(x ) + β(x − x ), 18. Yu. Nesterov, A. Nemirovskii. Interior k+1 k k k k − 1 Journal of Machine Learning Research, 9, point polynomial methods in convex 485-516 (2008). programming: Theory and Applications. where α and β are some fixed positive 3. A. Beck and M. Teboulle. A Fast Iterative SIAM, Philadelphia, 1994. coefficients (see [20] for historical details). Shrinkage-Threshold Algorithm Linear 19. B.T. Polyak. A general method of solving − Polynomial-time IPM are very similar Inverse Problems. Research Report, extremum problems. Soviet Mat. Dokl. 8, to some variants of the classical barrier Technion (2008). 593-597 (1967) methods [4]. 4. A.V. Fiacco and G.P. McCormick. Nonlinear 20. B. Polyak. Introduction to Optimization. − The idea to apply smoothing for Programming: Sequential Uncon strained Optimization Software, New York, 1987. Minimization Technique. John Wiley, New solving minimax problems is also 21. R. Polyak. Smooth Optimization Methods York, 1968. for Minimax Problems. SIAM J. Control not new (see [21] and the references 5. S. Hoda, A. Gilpin, and J. Pena. Smoothing therein). and Optimization, 26(6), 1274-1286 techniques for computing Nash equilibria (1988). of sequential games. Research Report. 22. N.Z. Shor. Minimization Methods for At certain moments of time, these ideas Carnegie Mellon University, (2008). Nondiffierentiable Functions. Springer- were quite new and attractive. However, 6. G.M. Korpelevich. The extragradient Verlag, Berlin, 1985. they did not result in a significant change method for finding saddle points and other 23. S.P. Tarasov, L.G. Khachiyan, and in computational practice since they were problems. Matecon 12, 747-756 (1976). I.I. Erlikh. The Method of Inscribed not provided with a convincing complexity 7. A. Nemirovski. Prox-method with rate Ellipsoids. Soviet Mathematics Doklady, 37, of convergence O(1/t) for variational 226− 230 (1988). op t i m a 7 8 November 2008 page

Steve Wright MPS Chair’s Column – Optima 78 15 October 2008 We note with sadness the passing of Garth Planning for our Society’s flagship event, problems at Argonne, the host site. In the McCormick, who died in Maryland on ISMP in Chicago (August 23-29, 2009), coming months, we hope to find the time, August 24. Professor McCormick’s 1968 continues to pick up speed, as you can see resources, and expertise to improve the book with Anthony Fiacco on logarithmic from the web site www.ismp2009.org . The utility of the site for optimization researchers barrier methods for nonlinear programming plenary and semi-plenary speakers have and users. was acclaimed at the time of its publication, been announced, as has the list of clusters There are a number of Society initiatives then had a second life during the interior- and cluster organizers. Registration and in the pipeline that you’ll hear more about point revolution starting in the mid-1980s, abstract submission through the web site in future columns. We have been working when it was recognized for its seminal will be available in November, along with on a new web site, which will be brought contributions. He was also a pioneer hotel information. Please contact a cluster online at the current URL in the coming in computational differentiation, and organizer in the relevant area if you wish to months. The new site will be easier to worked on many applications of nonlinear speak or organize a session. maintain and will scale better as the amount programming. We send our deepest Prizes sponsored by MPS and fellow of content grows. We’re also working on condolences to his family. societies will be awarded at the opening revisions to the Society’s constitution and I’m delighted with the launch of the new ceremony of ISMP, at Orchestra Hall in by-laws, modernizing them and bringing MPS journal Mathematical Programming Chicago. These are the leading prizes in them into line with current practice, Computation, whose first issue will appear our discipline, and I urge you to visit the and into conformity with the standards in 2009. An article by editor-in-chief Bill Society’s web site www.mathprog.org for expected for nonprofit organizations. The Cook and general editor Thorsten Koch information on prizes and the current revisions to the bylaws have been quite describing the genesis of the journal, calls for nominations, and think about extensive, with the Society’s council and its aims and scope, and the large and nominating your most deserving colleagues executive committee deeply involved in the distinguished editorial staff, appears in this for these honors. process. We will be presenting the modified issue of Optima. MPC (as we inevitably To those members who are not yet regular constitution to members in coming months, refer to it) is innovative in several respects, users of Optimization Online (www. with a view to ratifying it at the Society’s including its strong focus on software and optimization-online.org) I urge you to take business meeting at ISMP 2009. computation, its mechanism for evaluating a look at this valuable service. It contains You should by now have received contributions, and its means of distribution a repository of optimization preprints, membership renewal notices for your next (freely available online, with print edition which you can browse and contribute to. year membership in MPS. 2009 will be published by Springer and included in MPS You can also sign up for a digest that is a banner year for the Society, because of membership). My thanks to all who serve emailed at the start of each month. The ISMP, the new journal, and the other as editors and advisors, and especially to cgi scripts underlying the site have held initiatives mentioned above and in your Bill and Thorsten for their tireless work in up well without extensive modification renewal letter. I look forward with keen booting up the journal. Now it is up to us since being written by Jean-Pierre Goux in anticipation to your continued participation, to do some great computational research 2000, though there is the occasional hiccup especially those members who joined as a and send our papers (and software) to MPC! because of server transitions or security result of their attendance at ICCOPT 2007.

Application for Membership Mail to: Mathematical Programming Society I wish to enroll as a member of the Society. 3600 University City Sciences Center Philadelphia, PA 19104-2688 USA My subscription is for my personal use and not for the benefit of any library or institution. c I will pay my membership dues on receipt of your invoice. Cheques or money orders should be made c I wish to pay by credit card (Master/Euro or Visa). payable to The Mathematical Programming Society, Inc. Dues for 2008, including subscription to the journal Mathematical credit card no. expiration date Programming, are US $85. Retired are $40. family name Student applications: Dues are $20. Have a faculty member verify your student status mailing address and send application with dues to above address.

telephone no. telefax no. Faculty verifying status

E-mail Institution signature4 op t i m a 7 8 November 2008 page

continues from page 1 All MPS members will receive print 4 Information for Authors versions of the journal as part of their Manuscript to recommend that MPS possibly move membership benefits. The contents of the forward with a web-based journal. journal will be made freely available on Only articles written in the English the society-run MPC web site mpc.zib. language will be considered for publication. An MPC proposal was delivered to the de, housed at the Konrad-Zuse-Zentrum There is no pre-set page limit on articles, MPS Council in September and approved Berlin (ZIB). Supplementary material will but the journal encourages authors to be on November 11, 2007. Following this, be included on the web site, supporting concise. The length of the manuscript will negotiations began with Springer Verlag the computational studies described in the be taken into consideration in the review concerning possible distribution of the journal articles. process. Authors should aim to present journal. summaries of computational tests, rather On July 9, 2008, the MPS Council 3 Aims and Scope than long tables of individual results. unanimously approved the following two Detailed tables and log files can be included MPC publishes original research articles motions. in supplementary material to be made concerning computational issues in available on the journal’s Web site. mathematical programming. Topics covered 1. Council approves the establishment of Articles should give a general description in MPC include linear programming, Mathematical Programming Computation of the software, its scope, and the algorithms convex optimization, nonlinear (MPC) as a journal of the Society, following used. Rather than long presentations of well- optimization, stochastic optimization, the guidelines proposed in the attached known algorithms, authors are encouraged robust optimization, integer programming, document “Mathematical Programming to give details that deviate from the known combinatorial optimization, global Computation: Notes on a New MPS state-of-the-art on specific design decisions optimization, network algorithms, and Journal”, with William Cook as the first and their consequences and implementation modeling languages. editor-in-chief and Thorsten Koch as the details. MPC supports the creation and first general editor. Council further approves distribution of software and data that foster the initial advisory board listed in this Software further computational research. The opinion document. Computer codes must be accompanied of the reviewers concerning this aspect of by a clear description of the environment the provided material is a considerable factor 2. Council approves the proposed contract in which they are expected to be built, in the editorial decision process. Another with Springer-Verlag GmbH attached to including instructions on how to obtain any factor is the extent to which the reviewers this message, concerning the publication of required third-party packages. Clear and are able to verify the reported computational Mathematical Programming Computation. easy to follow instructions must be given on results. To these aims, authors are highly how to build and run the author’s software, encouraged to provide the source code And MPC was off and running! The and how to use it to recompute any of their software. Submitted software is formation of the MPC Editorial Board computational results given in the article. archived with the corresponding research was completed in August 2008 and the articles. The software is not updated and first manuscript was submitted to MPC on the journal is not intended to be the point Submission September 9, 2008. of distribution for the software. The author’s Authors are invited to submit articles for The directors of the INFORMS licensing information is included with the possible publication in MPC. Articles Optimization Society and the SIAM archived software. In case the software is no can be submitted in Adobe PDF format Activity Group on Optimization have longer available through other means, MPC through the journal’s web-based system at been contacted regarding MPC. Both will distribute it on individual request under mpc.zib.de. Software and supplementary organizations strongly support the plans for the license given by the author. Our intent material can also be submitted through this the journal. Discussions with the COIN- is to at least partly remedy today’s situation system. Software should be delivered as a OR Technical Board have taken place where it is often impossible to compare new zip or gzipped-tar archive file that unpacks over the past year, focusing on possible results with those computed by other codes into a directory, reflecting the name of the connections between MPC and the several years ago. software. COIN-OR services. Articles describing software where no source code is made available are acceptable, Review 2 Journal Distribution provided reviewers are given access to Articles within the scope of the journal MPC will be published together with executable codes that can be used to evaluate will receive a rigorous review. The editorial Springer Verlag, with the first volume, reported computational results. Articles may board will strive to have papers reviewed consisting of four issues, appearing in 2009. also provide data, their description, and within a four-month period. This target will The partnership with Springer creates an analysis. Articles not providing any software be extended in cases of exceptionally long or attractive combination of accessibility for or data will be considered, provided they difficult manuscripts. both authors and academic institutions. advance the state-of-the-art regarding a The review of articles describing computational topic. software will include an evaluation of the op t i m a 7 8 November 2008 page

computer codes received with the submitted Production Editor • Gerhard Reinelt (University of manuscript. The criteria used in the The Production Editor is responsible for Heildelberg): Combinatorial software review include the following points. building and maintaining the MPS web Optimization 1. The innovation, breadth, and depth of distribution of the journal, including an • Kim-Chuan Toh (National University of the contribution. on-line submission process. The initial Singapore): Convex Optimization 2. An evaluation of the progress in Production Editor is Wolfgang Dalitz (ZIB). performance and features compared Associate Editors with existing software. Advisory Board Papers are assigned to an Associate Editor 3. The conditions under which the An Advisory Board provides general by one of the Area Editors or by the Editor- software is available. oversight of the journal. Membership on in-Chief. The AE will seek to obtain reviews 4. The availability and quality of user the board is subject to approval by the MPS from at least two referees (one of whom documentation. Publications Committee and the MPS could be the AE), or in the case of weaker 5. The accessibility of the computer code; Council. The initial board consists of the papers a single negative report. The identity the ease with which a developer can following members. of the assigned AE is not revealed to the make modifications. • Robert Bixby (Rice University) author of the paper. The members of the • Donald Goldfarb (Columbia University) AE board are as follows. 5 Editorial Board • Nick Gould (Rutherford Appleton • Shabbir Ahmed, Georgia Tech The structure of the MPC Editorial Board Laboratory) • Samuel Burer, University of Iowa is similar to that of the Mathematical • Martin Grötschel (Konrad-Zuse- • Alberto Caprara, University of Bologna Programming Series A board, with an Zentrum Berlin) • Sanjeeb Dash, IBM TJ Watson Research additional team of Technical Editors to • David Johnson (AT&T Research) Center carry out software evaluations. It has been • Kurt Mehlhorn (Max-Planck-Institut • Camil Demetrescu, University of Rome suggested that MPC adopt the flat model Saarbrücken) • Matteo Fischetti, University of Padova used in SIAM journals, with the aim of • Hans Mittelmann (Arizona State • Emmanuel Fragniere, HEG, Geneva reducing average review times. Although we University) • Michael P. Friedlander, University of are not adopting this SIAM-like structure, • Arkadi Nemirovski (Georgia Tech) British Columbia this point can be revisited if a significant • Jorge Nocedal (Northwestern • Jacek Gondzio, University of Edinburgh percentage of review times are above the University) • Philip E. Gill, University of California four-month target. • Michael Trick (Carnegie Mellon San Diego University) • Oktay Günlük, IBM TJ Watson Research Center Editor-in-Chief • Robert Vanderbei (Princeton University) • David Williamson (Cornell University) • Michal Kocvara, University of The Editor-in-Chief has the overall Birmingham responsibility for the journal. The duties Area Editors • Adam N. Letchford, Lancaster include the formation of the Editorial University Area Editors have direct contact with Board, the establishment of guidelines and • Andrea Lodi, University of Bologna authors, carry out initial reviews of papers, quality standards for the review process, • François Margot, Carnegie Mellon make assignments to Associate Editors and oversight of the timeliness and fairness of University to Technical Editors, and make editorial reviews, the assignment of manuscripts to • Rafael Marti, University of Valencia decisions to accept or decline submissions. Area Editors, light copy editing of final • Laurent Michel, University of The initial board of Area Editors and their manuscripts, and the general promotion of Connecticut areas of interest are given in the following the journal. The initial Editor-in-Chief is • David Pisinger, University of list. William Cook (Georgia Tech). Copenhagen • Daniel Bienstock (Columbia University): • Nikolaos V. Sahinidis, Carnegie Mellon Linear and Integer Programming University General Editor • Robert Fourer (Northwestern • Peter Sanders, University of Karlsruhe The General Editor is responsible for the University): Modeling Languages and • Melvyn Sim, National University of quality of the software evaluation. The Systems Singapore duties include consulting with the Editor- • Andrew V. Goldberg (Microsoft • Huseyin Topaloglu, Cornell University in-Chief on the selection of a board of Research): Graph Algorithms and Data • Michael Ulbrich, Technische Universität Technical Editors, providing guidelines Structures München to the TE board, serving as a contact with • Sven Leyffer (Argonne National • Andreas Wächter, IBM TJ Watson the hardware/software support group, and Laboratory): Nonlinear Optimization Research Center assisting in setting up testing facilities. The • Jeffrey T. Linderoth (Univ. of • Renato Werneck, Microsoft Research initial General Editor is Thorsten Koch Wisconsin-Madison): Stochastic • Yin Zhang, Rice University (ZIB). Optimization, Robust Optimization, and Global Optimization continue on page 11 op t i m a 7 8 November 2008 page

Discussion Column Smooth Semidefinite Optimization Alexandre d’Aspremont Alexandre d’Aspremont Javier Peña Katya Scheinberg 1. Introduction early first order algorithms produced improved bounds of O(1/ε), they remained Semidefinite programming has received The smoothing method which Yurii relatively specialized. Furthermore, bundle a significant amount of attention since Nesterov describes in his article turns out methods required a significant number of [NN94] extended the classic complexity not only to be an important theoretical input parameters to be manually tuned to analysis of Newton’s method to a much advance in the first order method, but pick an appropriate subgradient and improve broader class of functions. Since then, also an inspiration for new approaches efficiency. and despite their somewhat abstract for many large scale convex optimization The smoothing argument developed nature, semidefinite programs have problems. in [Nes05] and discussed in the previous found a wide array of applications in paper solves both these issues at once. It engineering (see [BV04]), combinatorial We present two articles discussing the proves convergence of smooth optimization optimization (see [Ali95,GW95]), statistics use of smooth first order methods for methods on a wide class of semidefinite (see [BEGd07]) or machine learning (see two different settings, for which classical optimization problems with an explicit [SBXD05,WS06,LCB+02]) for example. approaches such as interior point methods bound of O(1/ε) on the total number of A number of efficient numerical packages failed to produce efficient methods. One iterations, and since the value of most have been developed to solve relatively large is a classical convex nonsmooth setting parameters are explicit functions of the semidefinite programs using interior point - semidefinite programming. While problem data, these methods require no algorithms based on Newton’s method (see semidefinite programs are inherently non- parameter tuning. [Mit03] for an early survey). These codes smooth, the smoothing and projection In what follows, we show how the exhibit both rapid convergence and excellent subproblems formed in the smoothing smooth optimization algorithm detailed in reliability. In particular, the quadratic argument detailed in the previous article [Nes05] was used in [dEGJL07] to solve a convergence of Netwon’s method near the can be solved explicitly for a wide class sparse PCA problem. We refer the reader to optimum mean that high precision solutions of semidefinite optimization problems. [Nes07], [Ous00], [Nem04] or [LNM07] can be obtained by running only a few more This means that smooth first-order for further details and alternative algorithms iterations. However, because these methods methods offer an alternative to interior with similar characteristics. point methods for solving large-scale form second order (Hessian) information to semidefinite programs. compute the Newton step, they have very high memory requirements, as well as a very 2. Semidefinite Optimization The other setting is computation of high numerical cost per iteration. For simplicity here, we will focus on the Nash equilibria of large sequential two- At the other end of the complexity particular semidefinite program formed in person, zero-sum games with imperfect spectrum lie bundle type methods (see [dEGJL07] to bound sparse eigenvalues of information. The smoothing approach has [HR00,Ous00] for example) which covariance matrices. Given a symmetric enabled the authors to find near-equilibria directly apply nonsmooth to semidefinite matrix A ∈ Sn, we seek to solve: for a four-round model of Texas Hold'em optimization problems after an appropriate minimize λmax(A + U) poker — a problem that is several orders of choice of subdifferential. These methods subject to |Ui j | ≤ ρ, (1) magnitude larger than what was previously only require forming a subgradient at in the variable U ∈ Sn. The objective of this computable. A poker player based on the each iteration, which in practice means problem is not smooth: when the leading resulting strategies turns out to be superior computing a few leading eigenvalues, eigenvalue λmax(A + U) is not simple, to most automatic poker players and is hence have a very low computational/ the function is non-differentiable (see competitive with the best human poker memory cost per iteration. However, early [Lew03,OW95] for details and an explicit players. bundle algorithms had a dependence in derivation of the Hessian). The smoothing 2 the precision target ε > 0 of O(1/ε ) which argument in [Nes05] first finds a smooth restricted their use to applications with a approximation of the objective function very coarse precision target. While some in (1) then applies an optimal first-order

Acknowledgements. The author would like to acknowledge support from NSF grant DMS-0625352, NSF CDI grant SES-0835550, ONR grant number N00014-07-1-0150, a Howard B. Wentz junior faculty award and a Peek junior faculty fellowship.

Alexandre d’Aspremont: ORFE, Princeton University, Princeton NJ 08544, USA. e-mail: [email protected]

Mathematics Subject Classification (1991): 44A12, 44A60, 90C05, 90C34, 91B28 Correspondence to: Alexandre d’Aspremont op t i m a 7 8 November 2008 page 10

method to the smooth problem. The benefit Another problem instance where of the smoothing step is that optimizing (1) 2.3. Implementation smooth optimization methods have proved using nonsmooth method has a complexity very efficient is covariance selection (see 2 To compute the gradient ∇ fμ(A + U) of O(1/ε ) while we will see that the smooth without overflows, we can evaluate it as: [dBEG06]). Here we solve, approximate problem can be solved with a complexity of O(1/ε).

in the variable X ∈ Sn. Here, the objective 2.1. Smoothing having set λ = λmax(A + U). Computing function is already smooth on the feasible In this case, smoothing the objective in (1) the gradient thus means computing a set, whenever α > 0. Here too, the two turns out to be relatively straight-forward. matrix exponential, which is a classic linear projection steps can be computed by problem (see [MVL03] for a complete projecting the spectrum of the current Indeed, the function 3 discussion) and has a complexity of O(n ). iterate X, hence can be computed with In fact this expression for the gradient also complexity O(n3). fμ(X) = μ log (Tr exp(X/μ)) provides us with an intuitive interpretation is a uniform ε-approximation of λ max(X). of the connection between the smoothing 3. Numerical experiments Furthermore, it was shown in [Nes07] technique and bundle methods. Because the matrix exponential can also be written: First-order methods for semidefinite or [Nem04] that its gradient ∇fμ(X) is optimization tradeoff a much lower cost per Lipschitz continuous with constant iteration with a much higher dependency on the target precision. This means that where λi and vi are the eigenvalues and one cannot hope to obtain solutions up to The tradeoff between the quality of the eigenvectors of the matrix (A + U−λI)/μ in a precision 10−8 that is routinely achieved decreasing order, the gradient can be seen approximation (controlled by ε) and the T by interior point algorithms. However, smoothness of the gradient (given by as a bundle of subdifferentials vi vi with first-order methods do solve semidefinite L = log n/ε) is here completely explicit. We weights decreasing exponentially with λi. optimization problems for which running will now apply a smooth minimization When the objective function is smooth even one iteration of interior point λ algorithm to the smooth approximation of max(A + U) is well separated from the rest algorithms is numerically hopeless. problem (1). of the spectrum and the smooth gradient In Figure 1, using the data in [ABN+99], T is close to the nonsmooth one v1v 1 , when with ρ = 1, we plot CPU time to get a 102 the leading eigenvalues are tightly clustered 2.2. Smooth minimization decrease in duality gap. The computing however the gradient will be a mixture of times are summarized below. We notice We now apply a slightly more elaborate subdifferentials. In that sense, the smooth version of the optimal first-order algorithm that solving a dense semidefinite program of semidefinite minimization algorithm can be size 2000 up to a relative precision of 10−2 detailed in equation (3) in the previous seen as a bundle method whose weights are paper. Let us write takes less than ten minutes on a quad-core adjusted adaptively with smoothness. computer with 2Gb of RAM. Q1 = {U ∈ Sn : |Ui j | ≤ ρ} Pushing the argument a bit further, one can show that only a few eigen-values are often the algorithm proceeds as follows. required to approximate with a precision sufficient to maintain convergence (see Repeat: [d’A05] for details).

1. Compute fμ(A + Uk) and ∇fμ(A + Uk) ∈ 2. Find Yk = arg minY Q 1 2.4. Other examples The success of the smoothing technique in

the previous examples stems from the fact 3. FindWk = arg minW∈Q 1 that smooth uniform approximations could be computed efficiently (analytically in fact) and that both projections on the feasible set were available in closed form. This Fig. 1. CPU time for solving problem (1) on 4. Set situation is far from being exceptional and covariance matrices of increasing dimension the smoothing technique has been applied n, formed using a gene expression data set. Until gap ≤ ε. to several other semidefinite optimization problems which require solving large-scale n CPU time (secs) Step 1 is the most computationally dense instances with relatively low precision. 100 0 m 01 s intensive step in the algorithm and involves In fact, projecting on simple feasible sets 500 0 m 11 s computing a matrix exponential to derive is often much simpler than forming the 1000 1 m 16 s the gradient of f (A+U). Steps 2 and 3 are corresponding barrier or computing a μ 2000 9 m 41 s simply Euclidean projections on the unit Newton step.

box in Sn. op t i m a 7 8 November 2008 page 11

Review 49 (2007), no. 3, 434–448. monotone operators and smooth [GW95] M.X. Goemans and D.P. Williamson, convexconcave saddle point problems, Improved approximation algorithms SIAM Journal on Optimization 15 References for maximum cut and satisfiability (2004), no. 1, 229–251. [ABN+99] A. Alon, N. Barkai, D. A. problems using semidefinite [Nes05] Y. Nesterov, Smooth minimization of Notterman, K. Gish, S. Ybarra, programming, J. ACM 42 (1995), non-smooth functions, Mathematical D. Mack, and A. J. Levine, Broad 1115–1145. Programming 103 (2005), no. 1, patterns of gene expression revealed by [HR00] C. Helmberg and F. Rendl, A spectral 127–152. clustering analysis of tumor and normal bundle method for semidefinite [Nes07] Y. Nesterov, Smoothing technique colon tissues probed by oligonucleotide programming, SIAM Journal on and its applications in semidefinite arrays, Cell Biology 96 (1999), Optimization 10 (2000), no. 3, optimization, Mathematical 6745–6750. 673–696. Programming 110 (2007), no. 2, [Ali95] F. Alizadeh, Interior point methods [LCB+02] G. R. G. Lanckriet, N. Cristianini, 245–259. in semidefinite programming with P. Bartlett, L. El Ghaoui, and M. [NN94] Y. Nesterov and A. Nemirovskii, applications to combinatorial I. Jordan, Learning the kernel matrix Interior-point polynomial algorithms optimization, SIAM Journal on with semi-definite programming, in convex programming, Society for Optimization 5 (1995), 13–51. 19th International Conference on Industrial and Applied Mathematics, [BEGd07] Onureena Banerjee, Laurent Machine Learning (2002). Philadelphia, 1994. El Ghaoui, and Alexandre [Lew03] A.S. Lewis, The mathematics of [Ous00] F. Oustry, A second-order bundle d’Aspremont, Model selection through eigenvalue optimization, Math. method to minimize the maximum sparse maximum likelihood estimation, Program., Ser. B 97 (2003), 155–176. eigenvalue function, Mathematical ICML 2006. [LNM07] Z. Lu, A. Nemirovski, and R.D.C. Programming 89 (2000), no. 1, [BV04] S. Boyd and L. Vandenberghe, Convex Monteiro, Large-scale semidefinite 1–33. optimization, Cambridge University programming via a saddle point [OW95] M. L. Overton and Robert S. Press, 2004. Mirror-Prox algorithm, Mathematical Womersley, Second derivatives for [d’A05] A. d’Aspremont, Smooth optimization Programming 109 (2007), no. 2, optimizing eigenvalues of symmetric with approximate gradient, ArXiv: 211–237. matrices, SIAM J. Matrix Anal. Appl. math.OC/0512344 (2005). [Mit03] H. D. Mittelmann, An independent 16 (1995), no. 3, 697–718. [dBEG06] A. d’Aspremont, O. Banerjee, and benchmarking of SDP and SOCP [SBXD05] J. Sun, S. Boyd, L. Xiao, and P. L. El Ghaoui, First-order methods solvers, Mathematical Programming Diaconis, The fastest mixing Markov for sparse covariance selection, SIAM Series B 95 (2003), no. 2, 407–430. process on a graph and a connection to a Journal on Matrix Analysis and [MVL03] C. Moler and C. Van Loan, maximum variance unfolding problem, its Applications, 30(1), pp. 56-66, Nineteen dubious ways to compute the SIAM Review (2005). February 2008. exponential of a matrix, twenty-five [WS06] K.Q. Weinberger and L.K. [dEGJL07] A. d’Aspremont, L. El Ghaoui, M.I. years later, SIAM Review 45 (2003), Saul, Unsupervised Learning of Jordan, and G. R. G. Lanckriet, A no. 1, 3–49. Image Manifolds by Semidefinite direct formulation for sparse PCA using [Nem04] A. Nemirovski, Prox-method with rate Programming, International Journal semidefinite programming, SIAM of convergence O(1/T) for variational of Computer Vision 70 (2006), no. inequalities with lipschitz continuous 1, 77–90.

continued from page 8 Technical Editors • Oliver Bastert, Dash Optimization • Dominique Orban, Ecole Polytechnique The review and testing of software is carried • Pietro Belotti, Lehigh University de Montréal out by a board of Technical Editors, with • Hande Y. Benson, Drexel University • Ted Ralphs, Lehigh University guidance from the General Editor. The • Andreas Bley, Konrad-Zuse-Zentrum • Mohit Tawarmalani, Purdue University TE board has access to hardware/software Berlin • Stefan Vigerske, Humboldt-Universität platforms run by the journal to aid in the • Brian Borchers, New Mexico Tech Berlin review process. • Jordi Castro, Universitat Politècnica de • Richard A. Waltz, University of Software to review is assigned to a TE by Catalunya Southern California one of the Area Editors or by the Editor- • Daniel Espinoza, University of Chile in-Chief. The review can be carried out by • Armin Fügenschuh, TU Darmstadt 6 Links for Further Information the TE, or a referee can be contacted. The • Andreas Grothey, University of • MPC Home Page mpc.zib.de identity of the assigned TE is not revealed Edinburgh • MPS Journals to the author of the software. The TE may • Zonghao Gu, Gurobi Optimization www.mathprog.org/mps_pubs.htm produce a public report describing the tests • William Hart, Sandia National • Springer’s MPC Page that were carried out; the report will be Laboratories www.springer.com/math/ made available as supplementary material on • Keld Helsgaun, Roskilde University journal/12532 the journal’s web site. The initial TE board • Benjamin Hiller, Konrad-Zuse-Zentrum • Notes on MPC is made up of the following members. Berlin www.isye.gatech.edu/˜wcook/mpc/ • Tobias Achterberg, ILOG • Leonardo B. Lopes, University of • MPC will use the Open Journal • Erling D. Andersen, MOSEK ApS Arizona Systems (OJS) from Simon Fraser • David Applegate, AT&T Research • Todd S. Munson, Argonne National Laboratory University pkp.sfu.ca/ojs op t i m a 7 8 November 2008 page 12 Nash equilibria computation via Javier Peña smoothing techniques October, 2008

1 Introduction problem (1) possesses a great deal of are smooth with Lipchitz gradients of order A sequential game is a mathematical model structure that makes it particularly well- O(1/μ) and satisfy of the interaction of multiple self-interested suited for Nesterov’s smoothing techniques. players in dynamic stochastic environments The three main structural features are the with limited information. Poker is a widely saddle-point formulation, the combinatorial structure of the sets Q ,Q , and a natural popular example of these types of games. 1 2 where D = max{d (u) : u ∈ Q }. By applying factorization of the payoff matrix A. As the i i i Unlike other popular sequential games an optimal gradient method to the smooth such as chess or checkers, poker is a game sections below explain, these features are approximations fμ ,fμ , Nesterov [11,12] of imperfect information. Consequently, nicely compatible with Nesterov’s smoothing devised an algorithm that computes x ∈ Q speculation and counter-speculation are technique. By taking advantage of these 1,y ∈ Q 2 such that inherent features of the game and make the structural properties, we have computed near-equilibria for sequential games whose computation of optimal strategies a highly 0 ≤ f (x)−f(y) ≤ e non-trivial task. Optimal strategies must linear programming formulation would require about three hundred million necessarily include tactics such as bluffing in O(1/e) first-order iterations. Each iteration variables and constraints, and over four and slow playing. Indeed, poker has been requires only a few elementary operations, trillion non-zeros entries. The computed identified as an central challenge in artificial some matrix-vector multiplications involving near-equilibria have been instrumental in intelligence [2] and has become a topic of A, and the solution of some subproblems of the design of competitive automatic poker active research [1,6,7,9]. the form A fundamental solution concept for players, including the winner of the 2008 sequential games is Nash equilibrium, which AAAI annual poker competition. max{〈s,u〉−di(u) : u ∈ Q i} (3) is a simultaneous choice of strategies for for i = 1,2. all players so that each player’s choice is 2 Smoothing technique for saddle- optimal given the other players’ choices. For point problems 3 Nice prox-functions a two-person, zero-sum sequential game, the The saddle-point problem (1) can be written In order for the above smoothing technique Nash equilibrium problem has the following as saddle-point formulation: to be an implementable algorithm, problem (3) must be easily computable since it has where to be solved several times at each iteration. (1) We say that a prox-function di for Q i is nice if the solution to problem (3) is easily Here the sets Q1,Q2 are polytopes associated computable, for example via a closed-form The functions f and f are non-smooth to the possible sequences of moves of the expression. convex and concave respectively. As players, and A is Player 2’s payoff matrix The polytopes Q ,Q arising in the Nash Nesterov points out in the previous article, 1 2 [3,8,14,15]. The saddle-point problem (1) equilibrium problem (1) encode the behavior the max-form of f and min-form of f can be cast as a linear program, but the strategies of the players in the sequential can be readily used to construct smooth resulting formulation is prohibitively large game [13–15]. A behavior strategy for Player approximations. More precisely, assume d is for most interesting games. For instance, i i prescribes a probability distribution over a non-negative and strongly convex function the payoff matrix A in (1) for limit Texas the choices available to Player i at every state on Q for i = 1, 2. Such a function is called Hold’em poker has dimension of order i of the game where it is Player i’s turn to 14 14 18 a prox-function for Q . For a given μ > 0 the 10 ×10 and contains more than 10 non- i make a move. For games in strategic normal functions zero entries. Problems of this magnitude are form, there is no sequential component

far beyond the capabilities of state-of-the-art and the sets Q1,Q2 are simplexes. In this general-purpose linear programming solvers case each element of Qi is a probability [4,5]. On the other hand, the equilibrium (2) distribution over the set of pure strategies available to Player i. For sequential games

The author would like to acknowledge support from NSF grant CCR-0830533, and a research grant from the Tepper School of Business.

J. Peña, Tepper School of Business, Carnegie Mellon University, Pittsburgh, PA, USA e-mail: [email protected] op t i m a 7 8 November 2008 page 13

in extensive form, the sets Q1,Q2 are the 4 Concise representation of the needed to store a problem instance. In sets of realization plans of the players. A payoff matrix particular, the concise representation for realization plan is a concise encoding of a the largest game that we have handled so The payoff matrix A in poker games has a behavior strategy in terms of the possible far requires about 40GB of memory [3]. diagonal block structure where each block sequences of moves of the player [14,15]. The additional memory overhead required in turn has a natural factorization. For Sets of realization plans are polytopes by the smoothing algorithm is essentially instance, for a four-round poker game, the that can be seen as a generalization of negligible. By contrast, an explicit sparse payoff matrix can be written as simplexes. They are obtained by recursive representation of this problem would require application a certain branching operation over 80,000GB of memory. A general- that encapsulates the relationship between purpose linear programming solver, such as consecutive sequences of moves [8,14,15]. an interior-point algorithm, would require A crucial ingredient in the a substantial additional amount of memory

implementation of a first-order smoothing The matrices Fi correspond to sequences of throughout its execution. algorithm for (1) is the construction of nice moves in round i that end with a fold. The In addition to the savings in memory prox-functions for the sets of realization matrix S correspond to sequences of moves requirements, the above concise plans Q1,Q2. Hoda, Gilpin and Peña [8] that end with a showdown. The matrices representation of the payoff matrix A can

provide a generic template that constructs Bi encode the betting structure in round i. be used for parallel computation. A parallel nice prox-functions for Q1,Q2 using Finally, the matrix W encode the win/lose/ implementation of the matrix-vector as building blocks any given family of draw information determined by poker hand subroutines achieves nearly a linear speedup nice prox-function for simplexes. In our ranks. [3]. This in turn has an immediate impact numerical implementation, we have used We take advantage of this factorization on the overall performance of the smoothing m the entropy function lnm+∑ i=1 xi ln xi and to avoid forming the matrix A explicitly. algorithm because the matrix-vector m the Euclidean distance function ½∑ i=1(xi Instead, we construct subroutines that products are the main bottleneck at each 2 −1/m) as building blocks. Both of these are compute the matrix-vector products x Ax iteration. families of nice prox-functions for simplexes and y ATy as needed in the smoothing [10,11]. In our computational experiments algorithm. This concise representation we obtained consistently faster convergence of the payoff matrix yields dramatic with the prox-functions induced by the savings in terms of the amount of space entropy function.

References 1. Bard, N., Bowling, M.: Particle filtering for abstraction in imperfect information 10. Nemirovski, A.: Prox-method with rate dynamic agent modelling in simplified games: An experimental comparison using of convergence O(1/t) for variational poker. In: Proceedings of the Twenty- poker. In: D. Fox, C.P. Gomes (eds.) inequalities with Lipschitz continuous Second Conference on Artificial Proceedings of the Twenty-Third AAAI monotone operators and smooth Intelligence (AAAI), pp. 515–521 (2007) Conference on Artificial Intelligence, convex-concave saddle point problems. 2. Billings, D., Peña, L., Schaeffer, J., Szafron, AAAI 2008, Chicago, Illinois, USA, July SIAM Journal on Optimization 15(1), D.: The challenge of poker. Artificial 13-17, 2008, pp. 1454–1457. AAAI Press 229–251 (2004) Intelligence 134(1-2), 201–240 (2002). (2008) 11. Nesterov, Y.: Excessive gap technique Special Issue on Games, Computers and 7. Gilpin, A., Sandholm, T., Sørensen, T.B.: in nonsmooth convex minimization. Artificial Intelligence GS3 and Tartanian: game theory-based SIAM J. on Optim. 16(1), 235–249 3. Gilpin, A., Hoda, S., Peña, J., Sandholm, headsup limit and no-limit texas hold’em (2005) T.: Gradient-based algorithms for finding poker-playing programs. In: AAMAS 12. Nesterov, Y.: Smooth minimization of Nash equilibria in extensive form games. (Demos), pp. 1647–1648. IFAAMAS non-smooth functions. Math Program. In: Workshop on Internet and Network (2008) 103(1), 127–152 (2005) Economics (WINE-07) (2007) 8. Hoda, S., Gilpin, A., Peña, J.: Smoothing 13. Osborne, M., Rubinstein, A.: A 4. Gilpin, A., Sandholm, T.: Optimal Rhode techniques for computing Nash equilibria Course in Game Theory. MIT Press, Island Hold’em poker. In: M. Veloso, of sequential games. Tech. rep., Carnegie Cambridge, MA (1994) S. Kambhampati (eds.) 20st National Mellon University (2008) 14. Stengel, B.V.: Efficient computation Conference on Artificial Intelligence 9. Miltersen, P.B., Sørensen, T.B.: A near- of behavior strategies. Games and (AAAI-05), pp. 1684–1685 (2005) optimal strategy for a heads-up no-limit Economic Behavior 14, 220–246 (1996) 5. Gilpin, A., Sandholm, T.: Finding equilibria texas hold’em poker tournament. In: 15. Stengel, B.V.: Equilibrium computation in large sequential games of imperfect E.H. Durfee, M. Yokoo, M.N. Huhns, for games in strategic and extensive information. In: Proceedings, ACM O. Shehory (eds.) 6th International Joint form. In: N. Nisan, T. Roughgarden, E. Conference on Electronic Commerce Conference on Autonomous Agents and Tardos, V.V. Vazirani (eds.) Algorithmic (EC’06), Ann Arbor, MI (2006) Multiagent Systems (AAMAS 2007), Game Theory. Cambridge University 6. Gilpin, A., Sandholm, T.: Expectation- Honolulu, Hawaii, USA, May 14-18, Press (2007) based versus potential-aware automated 2007, p. 191. IFAAMAS (2007) op t i m a 7 8 November 2008 page 14 D.R. FULKERSON PRIZE Call for nominations The Fulkerson Prize Committee invites December 2008). The prizes will be given The Fulkerson Prize Committee consists nominations for the Delbert Ray for single papers, not series of papers or of Bill Cook, Georgia Tech, chair, Michel Fulkerson Prize, sponsored jointly by the books, and in the event of joint authorship Goemans, MIT and Danny Kleitman, MIT. Mathematical Programming Society and the prize will be divided. the American Mathematical Society. Up to Please send your nominations (including three awards are presented at each (triennal) The term 'discrete mathematics' is reference to the nominated article and International Symposium of the MPS. The interpreted broadly and is intended an evaluation of the work) by January Fulkerson Prize is for outstanding papers in to include graph theory, networks, 15th, 2009 to the chair of the committee. the area of discrete mathematics. The Prize mathematical programming, applied Electronic submissions to [email protected]. will be awarded at the 20th International combinatorics, applications of discrete edu are preferred. Symposium on Mathematical Programming mathematics to computer science, and to be held in Chicago, August 23-29, 2009. related subjects. While research work in William Cook these areas is usually not far removed from Industrial and Systems Engineering Eligible papers should represent the final practical applications, the judging of papers Georgia Tech publication of the main result(s) and will be based only on their mathematical 765 Ferst Drive should have been published in a recognized quality and significance. Atlanta, Georgia 30332-0205 journal, or in a comparable, well- USA refereed volume intended to publish final Further information about the Fulkerson publications only, during the six calendar Prize including previous winners can be e-mail: [email protected] years preceding the year of the Symposium found at (thus, from January 2003 through www.mathprog.org/prz/fulkerson.htm and at www.ams.org/prizes/fulkerson-prize.html.

Beale-Orchard-Hays Prize 2009—Call for nominations Nominations are invited for the 2009 The 2009 Prize will be awarded at the work. If done in writing, submissions Beale-Orchard-Hays Prize for excellence in awards session of the International should include four copies of the nominated computational mathematical programming. Symposium on Mathematical work. Supporting justification and any Programming, to be held August 23- supplementary material are strongly The Prize is sponsored by the Mathematical 28, 2009, in Chicago, Illinois, USA. encouraged but not mandatory. The Prize Programming Society, in memory of Information about the Symposium can be Committee reserves the right to request Martin Beale and William Orchard-Hays, found at www.ismp2009.org further supporting material and justification pioneers in computational mathematical The 2009 Prize Committee consists of from the nominees. programming. Nominated works must have been published between Erling Andersen, Mosek Nominations should be submitted to: Jan 1, 2006 and Dec 31, 2008, and Philip Gill, University of California San demonstrate excellence in any aspect of Diego Nick Sahinidis computational mathematical programming. Jeff Linderoth,University of Wisconsin Department of Chemical Engineering Computational mathematical programming Madison Carnegie Mellon University 5000 Forbes Avenue includes the development of high-quality Nick Sahinidis (chair), Carnegie Mellon Pittsburgh, PA 15213 mathematical programming algorithms University USA and software, the experimental evaluation e-mail: sahinidis[at]cmu.edu of mathematical programming algorithms, Nominations can be submitted and the development of new methods electronically or in writing, and should The deadline for receipt of nominations is for the empirical testing of mathematical include detailed publication details of the January 15, 2009. programming techniques. Full details of nominated work. Electronic submissions prize rules and eligibility requirements can should include an attachment with the be found at www.mathprog.org/prz/boh.htm final published version of the nominated op t i m a 7 8 November 2008 page 15

The 20th International Symposium on Mathematical Programming will take place August 32 -29, 2009 in Chicago, Illinois. The meeting will be held at the University of Chicago’s Gleacher Center and the Marriott Downtown Chicago Magnificent Mile Hotel. Festivities planned for the conference include the opening session in Chicago’s Orchestra Hall, home of the Chicago Symphony Orchestra, the conference banquet at the Field Museum, Chicago’s landmark natural history museum, and a celebration of the 60th anniversary of the Zeroth ISMP Symposium.

The invited speakers are Eddie Anderson, University of Sydney Friedrich Eisenbrand, EPFL Matteo Fischetti, University of Padova Pablo Parrilo, MIT Martin Skutella, Technische Universität Berlin Éva Tardos, Cornell University Shuzhong Zhang, Chinese University of Hong Kong Mihai Anitescu, Argonne National Lab András Frank, Eötvös Loránd University Jong-Shi Pang, UIUC Andrzej Ruszczynski, Rutgers University David Shmoys, Cornell University Paul Tseng, University of Washington

Papers on all theoretical, computational and practical aspects of mathematical programming are welcome. The program clusters and their organizers have been announced. Parties interested in organizing a session are encouraged to contact the cluster chairs.

Combinatorial Optimization András Frank, Tom McCormick Integer and Mixed-Integer Programming Andrea Lodi, Robert Weismantel Nonlinear Programming Philip Gill, Philippe Toint Nonlinear Mixed-Integer Progamming Sven Leyffer, Andreas Wächter Complementarity and Variational Inequalities Masao Fukushima, Danny Ralph Conic Programming Kim-Chuan Toh Nonsmooth and Convex Optimization Michael Overton, Marc Teboulle Stochastic Optimization Shabbir Ahmed, David Morton Robust Optimization Aharon Ben-Tal Global Optimization Christodoulos A. Floudas, Nick Sahinidis Logistics and Transportation Xin Chen, Georgia Perakis Game Theory Asu Ozdaglar, Tim Roughgarden. Telecommunications and Networks Martin Skutella Approximation Algorithms Cliff Stein, Chandra Chekuri Optimization in Energy Systems Andy Philpott, Claudia Ságastizabal PDE-Constrained Optimization Matthias Heinkenschloss, Michael Hintermüller Derivative-Free and Simulation-Based Optimization Jorge Moré, Katya Scheinberg Sparse Optimization Michael Saunders, Yin Zhang Finance and Economics Tom Coleman, Kenneth Judd Implementations and Software Erling Andersen, Michal Kočvara Variational Analysis Boris Mordukhovich, Shawn Wang

Registration and hotel information will be available by the end of November. Further information about the symposium can be found on the conference Web site, www.ismp2009.org. OP t i M A mathematical programming society

Center for Applied Optimization 401 Weil Hall P.O. Box 116595 Gainesville, FL 32611-6595 USA FIRST CLASS MAIL

EDITOR: CO-EDITORS: founding editor: Andrea Lodi Alberto Caprara Donald W. Hearn DEIS University of Bologna, DEIS University of Bologna, Viale Risorgimento 2, Viale Risorgimento 2, published by the I - 40136 Bologna, Italy I - 40136 Bologna, Italy mathematical programming society & e-mail: [email protected] e-mail: [email protected] University of Florida

Katya Scheinberg Journal contents are subject to change by the IBM T.J. Watson Research Center publisher. PO Box 218 Yorktown Heights, NY 10598, USA [email protected]