INFORMS OS Today The Newsletter of the INFORMS Optimization Society

Volume 5 Number 1 May 2015

Contents Chair’s Column Chair’s Column ...... 1 Nominations for OS Prizes ...... 25 Suvrajeet Sen Nominations for OS Ocers ...... 26 University of Southern California [email protected] Featured Articles A Journey through Optimization Dear Fellow IOS Members: Dimitri P. Bertsekas ...... 3 It is my pleasure to assume the role of Chair of Optimizing in the Third World the IOS, one of the largest, and most scholarly Cl´ovis C. Gonzaga ...... 5 groups within INFORMS. This year marks the From Lov´asz ✓ Function and Karmarkar’s Algo- Twentieth Anniversary of the formation of the rithm to Semidefinite Programming INFORMS Optimization Section, which started Farid Alizadeh ...... 9 with 50 signatures petitioning the Board to form a section. As it stands today, the INFORMS Local Versus Global Conditions in Polynomial Optimization Society has matured to become one of Optimization the larger Societies within INFORMS. With seven Jiawang Nie ...... 15 active special interest groups, four highly coveted Convergence Rate Analysis of Several Splitting awards, and of course, its strong presence in every Schemes INFORMS Annual Meeting, this Society is on a Damek Davis ...... 20 roll. In large part, this vibrant community owes a lot to many of the past ocer bearers, without whose hard work, this Society would not be as strong as it is today. For the current health of our Society, I wish to thank my predecessors, and in particular, my immediate predecessor Sanjay Mehrotra under whose leadership, we gathered the momentum for our conferences, a possible new journal, and several other initiatives. I also take this opportunity to express my gratitude to Jim Luedtke, Send your comments and feedback to the Editor: Secretary/Treasurer, whose soft-spoken assurance, and attention to detail keeps the society humming. Shabbir Ahmed Another person whose support is instrumental to School of Industrial & Systems Engineering many communications is our Web Editor: Pietro Georgia Tech, Atlanta, GA 30332 Belotti; thanks Pietro. [email protected] The major agenda item in the upcoming years is the possibility of a new journal, sponsored by 2 INFORMS Optimization Society Newsletter the Society. Because of the growth in a variety titans of optimization have made multi-faceted con- of applications in Communications and Control, tributions. Data Science, and Machine Learning, there has Bertsekas was awarded for his pioneering role in been an explosive growth in Optimization. This • field has always attracted advanced applications (with uncountable state such as Energy, Finance, Health Care and others. spaces, approximate, neuro-dynamic and ap- What is remarkable today is that as the area ma- proximate dynamic programming), Lagrangean tures, many advanced applications are motivating methods, dual-based bounds for nonconvex new optimization concepts, including fundamental problems, network optimization, and applica- mathematical breakthroughs (e.g., the tions in communications, power, transportation of risk), algorithmic speed-ups and robust software and others. Moreover his books on many of (e.g., convex, and mixed-integer optimization), as these topics have been adopted at many uni- well as new modeling paradigms (e.g., multi-agent versities, world-wide. Gonzaga for his leadership in interior point and data driven optimization). These are but a few • of the promising directions today, and there are too methods (IPM), augmented Lagrangean meth- many more to mention in this column. ods, on filter methods for constrained non-linear optimization, and on fast steepest descent meth- ods. He has recently developed steepest de- Given the above backdrop, it is no surprise that scent algorithms for the minimization of convex our Society has shown overwhelming support for a quadratic functions with the same performance new INFORMS journal devoted to Optimization. bound as Krylov space methods. He is also the Exactly what the focus of such a journal should author of several influential reviews which have be is not clear at this point. There are other open introduced IPMs to scores of young researchers. questions, and we hope to resolve them as the year progresses. Our plan is to start a dialogue using The 2014 Farkas’ Prize was awarded to Farid INFORMS Connect as a platform for discussion, Alizadeh (Rutgers) who laid the groundwork for and we may even hold some meetings at aliated the field of semidefinite programming (SDP), which conferences, such as at the International Symposium has grown quickly and steadily over the past 25 on Mathematical Programming in Pittsburgh this years. The citation continues the description of summer. Please be on the lookout for our calls at his contributions via connections between SDP conferences of these aliated groups. and combinatorial optimization, complementarity, non-degeneracy and algorithmic methodology. Our Society is at the heart of INFORMS, and it draws tremendous strength from this sense of lead- The 2014 Prize for Young Researchers was awarded ership. There is another source of strength: the to Jiawang Nie (UC - San Diego) for this paper caliber of scholars that the IOS attracts. In or- “Optimality conditions and finite convergence of der to celebrate the accomplishments of our re- Lasserre’s hierarchy,” Mathematical Programming, searchers, the IOS awards four highly coveted prizes: Ser. A (2014) 146:97-121, DOI: 10.1007/s10107-013- the Khachiyan Prize, the Farkas Prize, the Young- 0680-x. Prior to this proof of finite convergence, the Researcher Prize, and the Student-paper Prize. This existing theory could only establish the convergence newsletter provides a glimpse of the personal stories of this family of relaxations in the limit. No proof of of the award winners, and their unique perspectives finite convergence had been previously found in spite on optimization. The awards committees for these of the significant numerical evidence for finite con- prizes showed impeccable taste in their choices, and vergence that had been collected over a decade or so. I wish to thank all members of these committees for their diligence with the process. The winners for The winner of the 2014 Student Paper Prize was the year 2014 Khachiyan Prize were Dimitris Bert- Damek Davis (UCLA) for his paper “Convergence sekas (MIT) and Clovis Gonzaga (Federal University rate analysis of several splitting schemes,” arXiv of Santa Catarina, Florian´opolis, Brazil). These two preprint arXiv:1406.4834 (2014), with Wotao Yin. Volume 5 Number 1 May 2015 3

This paper “makes a very significant contribution to the theory of optimization by using a rigorous and unified convergence rate analysis of important A Journey through splitting schemes.” Optimization Dimitri P. Bertsekas The awards committees for this year have been Lab. for Information and Decision Sciences appointed, and they are listed elsewhere in this Massachusetts Institute of Tech., Cambridge, MA 02139 newsletter. I encourage the optimization community [email protected] to identify strong candidates for these awards by sending forth nominations in each of the categories I feel honored and grateful for being awarded the mentioned above. Khachiyan Prize. It is customary in such circum- stances to thank one’s institutions, mentors, and col- Finally, I wish to welcome our newly elected Vice laborators, and I have many to thank. I was fortu- Chairs: Aida Khajavirad (Global Optimization), nate to be surrounded by first class students and col- Warren Powell (Optimization Under Uncertainty), leagues, at high quality institutions, which gave me and Daniel Robinson (Nonlinear Optimization). space and freedom to work in any direction I wished They are joining the continuing team of Vice Chairs: to go. It is also customary to chart one’s intellectual Vladimir Boginski (Network Optimization),John roots and journey, and I will not depart from this Mitchell (Linear and Conic Optimization), Imre tradition. Polik (Computational Optimization and Software), We commonly advise scholarly Ph.D. students and Juan Pablo Vielma (Integer and Discrete Op- in optimization to take the time to get a broad timization). These individuals are truly the reason many-course education, with substantial mathemat- for the strong presence of IOS at the INFORMS ical content, and special depth in their research area. Annual Conferences, and I thank them for their Then upon graduation, to use their Ph.D. research leadership. area as the basis and focus for further research, while gradually branching out into neighboring fields, and Before I close, I wish to also express my sin- networking within the profession. This is good ad- cere appreciation for the work and leadership of vice, which I often give, but this is not how it worked Shabbir Ahmed. He has taken our newsletter for me! to a whole new level, and provided it continuity I came from with an undergraduate de- for the past four years. Such dedication to the gree in mechanical engineering, got my MS in con- Society is what makes IOS tick. Thank you Shabbir! trol theory at George Washington University in three semesters, while holding a full-time job in an unre- I hope my enthusiasm for our society is palpable in lated field, and finished two years later my Ph.D. this column, and hope that you will all get involved thesis on control under set membership uncertainty in the most vital society within INFORMS. Have a at MIT in 1971. I benefited from the stimulat- great summer, and see you all in either Pittsburgh ing intellectual atmosphere of MIT’s Electronic Sys- or Philadelphia. tems Laboratory (later LIDS), nurtured by Mike Athans and Sanjoy Mitter, but because of my short p.s: For those interested in reading about some new stay there, I graduated with little knowledge beyond areas for INFORMS researchers, follow this link Kalman filtering and linear-quadratic . which needs you to login at INFORMS Connect: Then I went to teach at Stanford in a department http://connect.informs.org/communities/ that combined mathematical engineering and oper- community-home/librarydocuments/ ations research (in which my background was rather viewdocument/?DocumentKey= limited) with economics (in which I had no expo- d7e454e3-1872-4826-900b-7871063a5980 sure at all). In my department there was little in- terest in control theory, and none at all in my the- sis work. Never having completed a first course in 4 INFORMS Optimization Society Newsletter analysis, my first assignment was to teach to un- gramming, and a probability textbook. Working suspecting students optimization by functional an- with Bob and John, and writing books with them alytic methods from David Luenberger’s wonderful was exciting and rewarding, and made MIT a spe- book. The optimism and energy of youth carried cial place for me. me through, and I found inspiration in what I saw Nonetheless, at the same time I was getting dis- as an exquisite connection between elegant mathe- tracted by many side activities, such as books in matics and interesting practical problems. Studying , dynamic programming, and David Luenberger’s other works (including his Non- network optimization, getting involved in applica- linear Programming book) and working next door to tions of queueing theory and power systems, and per- him had a lasting e↵ect on me. sonally writing several network optimization codes. Two more formative experiences at Stanford were By that time, however, I realized that simultane- studying Terry Rockafellar’s Convex Analysis book ous engagement in multiple, diverse, and frequently (and teaching a seminar course from it), and most changing intellectual activities (while not recom- importantly teaching a new course on dynamic pro- mended broadly) was a natural and exciting mode gramming, for which I studied Bellman’s books in of operation that worked well for me, and also had great detail. My department valued rigorous math- some considerable benefits. It stimulated the cross- ematical analysis that could be broadly applied, fertilization of ideas, and provided perspective for and provided a stimulating environment where both more broadly integrated courses and books. Over could thrive. Accordingly, my course aimed to com- time I settled in a pattern of closely connected ac- bine Bellman’s vision of wide practical applicability tivities that reinforced each other: focus on an inter- with the emerging mathematical theory of Markov esting subject to generate content and fill in missing Decision Processes. The course was an encouraging pieces, teaching to provide a synthesis and to write success at Stanford, and set me on a good track. It class notes, and writing a book to understand the has survived to the present day at MIT, enriched by subject in depth and put together a story that would subsequent developments in theoretical and approx- make sense to others. Writing books and looking imation methodologies. for missing pieces was an e↵ective way to generate After three years at Stanford, I taught for five new research, and working on a broad range of sub- years in the quiet and scholarly environment of the jects was a good way to discover interconnections, University of Illinois. There I finally had a chance to broaden my perspective, and fight stagnation. consolidate my mathematics and optimization back- In retrospect I was very fortunate to get into ground, through research to a great extent. In par- methodologies that eventually prospered. Dynamic ticular, it helped a lot that with the spirit of youth, programming developed perhaps beyond Bellman’s I took the plunge into the world of the measure- own expectation. He correctly emphasized the curse theoretic foundations of stochastic , of dimensionality as a formidable impediment in aiming to expand the pioneering Borel space frame- its use, but probably could not have foreseen the work of David Blackwell, in the company of my then transformational impact of the advances brought Ph.D. student Steven Shreve. about by , neuro-dynamic I changed again direction by moving back to MIT programming, and other approximation methodolo- in 1979, to work in the then emerging field of data gies. When I got into convex analysis and opti- networks and the related field of distributed com- mization, it was an emerging theoretical subject, putation. There I had the good fortune to meet overshadowed by linear, nonlinear, and integer pro- two colleagues with whom I interacted closely over gramming. Now, however, it has taken center stage many years: Bob Gallager, who coauthored with thanks to the explosive growth of machine learning me a book on data networks in the mid-80s, and and large scale computation, and it has become the John Tsitsiklis, who worked with me first while a lynchpin that holds together most of the popular op- doctoral student and then as a colleague, and over timization methodologies. Data networks and dis- time coauthored with me two research monographs tributed computation were thought promising when on distributed algorithms and neuro-dynamic pro- I got involved, but it was hard to imagine the pro- Volume 5 Number 1 May 2015 5 found impact they had on engineering, as well as the world around us today. Even set membership description of uncertainty, my Ph.D. thesis subject, Optimizing in the which was totally overlooked for nearly fifteen years, Third World eventually came to the mainstream, and has con- nected with the popular areas of robust optimiza- Cl´ovis C. Gonzaga Department of Mathematics, Federal tion, robust control, and model predictive control. University of Santa Catarina, Florian´opolis, SC, Brazil Was it good judgement or fortunate accident that [email protected] steered me towards these fields? I honestly cannot say. Albert Einstein wisely told us that “Luck is when opportunity meets preparation.” In my case, Introduction. In the year of my seventieth birth- I also think it helped that I resisted overly lengthy day I was amazed by being awarded the Khachiyan distractions in practical directions that were too spe- Prize, for which I thank the committee composed by cialized, as well as in mathematical directions that Tam´asTerlaky (chair), Daniel Bienstock, Immanuel had little visible connection to the practical world. Bomze and John Birge. An academic journey must have companions to I was born in South Brazil, and raised in Joinville, learn from and share with, and for me these were Santa Catarina, a small industrial town, formerly a my students and collaborators. In fact it is hard German settlement. I decided very soon to be an to draw a distinction, because I always viewed my engineer (I had the knack). A local company insti- Ph.D. students as my collaborators. On more than tuted a prize: the student with best grades in Math- one occasion, a Ph.D. thesis evolved into a book, as ematics, Physics, Chemistry and Drawing during the in the cases of Angelia Nedic and Asuman Ozdaglar, three years of high school accepted in an Engineering or into a long multi-year series of research papers school got a five year scholarship. I won this prize, after graduation, as in the cases of and and passed the entry exam in the best Engineering Janey Yu. My collaborators were many and I cannot school in the Country, the Technological Institute of mention them all, but they were special to me and I Aeronautics. ITA is a civilian school belonging to was fortunate to have met them. Thank you all for the Air Force, with courses in Electronic, Mechan- sharing this exciting journey with me. ical, Aeronautical and (nowadays) Computational Engineering. I studied Electronics, with emphasis in control and servomechanisms. ITA was founded in the fifties, organized by a team from MIT. We lived in the campus supported by the Air Force, in a very demanding system: with one grade D you are expelled, a grade C had to be replaced by B during the vacations (a maximum of five times). Plenty of jobs were o↵ered to graduates. I graduated in 1967, and went immediately to the graduate school of Engineering (COPPE) of the Fed- eral University of Rio de Janeiro, for an MSc in . COPPE and the Catholic University of Rio de Janeiro were the first Brazil- ian graduate schools with MSc and DSc titles in the American system. It was founded in the sixties with a substantial support from the Brazilian gov- ernment. We were pioneers, mostly isolated from the Tam´as Terlaky, Dimitri Bertsekas, Cl´ovis Gonzaga and world, in a time when slide rules were still used, in- Sanjay Mehrotra ternet had not been imagined even by science fiction writers. The school had an IBM 1130 computer with 6 INFORMS Optimization Society Newsletter

32 Kbytes of memory and no floating point proces- based on the fresh from press book by Olvi Mangas- sor, which allowed you to process your fortran pro- sarian. I read several texts on gram in punched cards once a day. and point to set mappings, by authors like Rockafel- I was an MSc student with a full time job, tak- lar, Berge, Bertsekas, Geo↵rion. I especially liked ing care of a TR48 analog computer (young people the idea of epsilon-subgradients, and had the idea do not know what that is), doing maintenance, op- of using this concept to assemble subradients of a erating and teaching analog computation. I started function and devise a general method for minimizing a research on adaptive control and read a text on convex functions. I travelled to the US, where Jean- identification of a control system, based on a method Paul scheduled interviews with two of my “idols”, called Levenberg-Marquardt (!). I got hooked. By Polak in Berkeley and Geo↵rion in UCLA. I told this time an IBM researcher and Berkeley part time them about my ideas, and they asked the obvious professor, Jean-Paul Jacob, came to Brazil for a year. question, about my adviser. I had none, and they Jean-Paul had a PhD from UC Berkeley, advised by agreed that the idea was great, but unattainable by Lucien Polak, and gave us a short course on non- a young man isolated in Brazil. Then Phillip Wolfe linear programming in 1969, the first to be taught came to Brazil, and I was in charge of showing him in Brazil. I wrote a dissertation with him, on “re- some of Rio de Janeiro. I took him to the Sugarloaf, duction of optimal control problems to non-linear to Corcovado, to my home, and eventually told him programming”. We had by then a group of nine about what I was doing. He said that he was doing quite talented students. Jean-Paul proposed to the precisely the same thing. I was delighted – Wolfe University to start a PhD program on Systems En- and I had the same ideas! In the following day I gineering in Brazil, and brought Lotfi Zadeh from abandoned the topic. The end of this tale was also Berkeley to organize it. We made an agreement with frustrating for Wolfe: Claude Lemarechal finished UC, and American professors came to Brazil to teach the bundle method before him, and certainly much us advanced courses. Most of us finished the PhD in before I would have done it, if ever. Berkeley, but I refused to go abroad. I wrote a thesis on graph search methods for solv- My doctoral research was done mostly alone. I ing sequential decision problems, with application to worked on two unrelated topics. the Brazilian transmission system. The results were Graph search methods. Stephen Coles from nice, but I never tried to publish any journal paper. Stanford visited us and taught a course on artificial No tradition, no advisor, no need to publish. intelligence, with emphasis on graph search meth- Berkeley. In 1975 I went to Berkeley for a post- ods, especially the A⇤ algorithm for computing min- doc with my “grandfather” Lucien Polak, working on imum cost paths using look-ahead heuristics. I used semi-infinite problems, using cutting sets for solving search methods for the solution of sequential deci- control systems design problems. We wrote a nice sion problems, for which at this time electrical engi- paper [10], and I had my first contact with a real neers knew only dynamical programming. I devised research environment. In my first technical interview an algorithm for planning the long term expansion with him, he was in his oce with another professor, of power transmission systems, using a partial or- unknown to me. Extremely nervous, stuttered “I dering of nodes (a preference relation) for pruning read your paper with Prof. Maine, and... and...”. less promising paths through a graph of system con- “Did you find an error?” I said “y-y-yes.” And he figurations. This was done in a contract with the said “Show us. By the way, meet David Maine, from Brazilian governmental power systems agency, using Imperial College”. Resisting the temptation to faint, a half million dollars IBM 370 computer with 256K I went to the blackboard and showed the mistake. of memory. The program took two hours of CPU, He asked if I knew how to do it right, I did, and this and for several years it was used in the actual plan- was the beginning of a life-time friendship. ning of the Brazilian interconnected system. Back to Brazil, I worked on the long term plan- Convex optimization. My main interest was in ning of the operation of hydro-thermal power sys- continuous optimization. I was the first Brazilian tems, using stochastic dynamic programming, and to teach a full course on non-linear programming, on the beautiful theory of water value. I went to Volume 5 Number 1 May 2015 7 many congresses, met many researchers in discrete systems, combinatorics and in power planning, but again published no journal papers. The political sit- uation in Brazil deteriorated, under a violent mili- tary dictatorship. Our salaries went down and re- search activities became irrelevant. Interior points. In 1985 Lucien Polak invited me to replace him in UC during a sabbatical year. I got a visiting professor position, and stayed in Berke- ley for two and a half years, teaching courses on automatic control. Karmarkar’s paper had just ap- peared, and I got very interested. He had studied with Richard Karp, and was an expert in combina- torics, complexity theory and linear programming, with no background in NLP, I believe. After much e↵ort in rephrasing his quite obscure Figure 1: The inchworm method, I saw that it could be reduced to a scaled steepest descent method applied to his potential function, and wrote a paper on this. Then I realized of the art paper [4] in SIAM Review, later endowed that he had not used the full potential of Newton’s with the “citation classics award” for the most cited method, and that it should be possible to improve paper in Mathematics and Computer Sciences writ- his result. I worked day and night, found dozens ten by a Brazilian in the 90’s decade. of ways of not improving his complexity results, got Back in Rio de Janeiro in 1987 I kept working on very depressed, went to a Chinese restaurant and interior point methods, alone in Brazil, sending and the fortune cookie said “You will solve a problem receiving technical reports by regular mail and trav- that will be very important in your life.” I had al- elling frequently abroad. Brazil was again learning ready decided for the logarithmic barrier function how to be a democracy. When computer commu- (instead of the potential function). I went home and nications arrived, science in the third world became started dealing with the third derivatives, leading feasible. to the large quadratic convergence region needed to I spent 1993 at INRIA in Paris, working with build the path following method represented by the Fr´ed´eric Bonnans [2, 5] and Jean-Charles Gilbert [3], inchworm in the figure. Properties of path following worked for a semester in TU Delft, in the company methods had already been obtained by Megiddo and of Roos and Terlaky. by Renegar. I presented my result in a workshop or- In 1994 I moved back to South Brazil, transferred ganized by Nimrod Megiddo in Asilomar, California, (with a group of DSc students) to the Math. De- in the beginning of 1987, and the paper was pub- partment of the Federal University of Santa Cata- lished in the workshop proceedings. All references rina in Florian´opolis, a city on a beautiful island about these early results are in [4]. surrounded by 40 beaches, where we have organized My life changed instantly. I met all the main two international workshops (clovisfest 60 and 70, brains in the field, and saw myself in the middle of in 2004 and 2014, with a plan for another one in the interior point revolution. A bunch of extremely 2024, maybe...). I worked on linear complementarity clever guys like Anstreicher, Adler, Goldfarb, Ko- problems, augmented Lagrangian methods, Nesterov jima, McCormick, Megiddo, Mizuno, Monteiro, Ne- optimal descent methods, filter methods, and lately mirovsky, Nesterov, Renegar, Roos, Todd, Vial, Ye on the complexity of descent methods for quadratic (forgive me if I forgot your name) travelled around minimization. I wrote papers with my former stu- the world like rodeo riders, mounting a new horse in dents Hugo Lara [13], Luis C. Matioli [14], Romulo each congress, with results that became obsolete in Castillo [6], Ademir Ribeiro [12, 16], Diane Rossetto a matter of months. In 1992 I published the state [8], Marcia Vanti [9], Roger Behling [1], Fernanda 8 INFORMS Optimization Society Newsletter

Raupp [15], Marli Cardia, and have a long lasting [10] C. C. Gonzaga and E Polak. On constraint dropping collaboration with Elizabeth Karas [7, 8, 9] from the schemes and optimality functions for a class of outer Federal University of Paran´ain Brazil. approximation algorithms. SIAM Journal on Con- trol and Optimization,17,1979. I am a member of the Brazilian Academy of Sci- ences and of TWAS. I am a SIAM fellow, I received [11] C. C. Gonzaga and R. Schneider. Improving the e- the Great Cross of the Brazilian Order of Scientific ciency of steepest descent algorithms for minimizing convex quadratics. Technical report, Federal Uni- Merit, and now this very prestigious prize in honour versity of Santa Catarina, Brazil, February 2015. of Leonid Khachiyan. Now I am retired, leisurely working in Flo- [12] E. W. Karas, C. C. Gonzaga, and A. A. Ribeiro. Local convergence of filter methods for nonlinear program- rian´opolis as a voluntary researcher. I thank my ming. Optimization,59:1153–1171,2010. students, old and new [11], for the joy they brought to my life in these many years. [13] H Lara and C. C. Gonzaga. A note on properties of con- dition numbers. Linear Algebra and its Applications, 261:269–273, 1997. REFERENCES [14] L. C. Matioli and C. C. Gonzaga. A new family of penal- ties for augmented lagrangian methods. Numerical Linear Algebra with Applications,15:925–944,2008. [1] Roger Behling, C. C. Gonzaga, and G Haeser. Primal- dual relationship between levenberg-marquardt and [15] F. M. Raupp and C. C. Gonzaga. A center cutting plane central trajectories for linearly constrained convex algorithm for a likelihood estimate problem. Compu- optimization. Journal of Optimization Theory and tational Optimization and Applications,21:277–300, Applications,162:705–717,2013. 2000.

[16] A. A. Ribeiro, E. W. Karas, and C. C. Gonzaga. Global [2] F. Bonnans and C. C. Gonzaga. Convergence of interior convergence of filter methods for nonlinear program- point algorithms for monotone linear complementar- ming. SIAM Journal on Optimization,19:1231– ity problems. Mathematics of Operations Research, 1249, 2008. 21:1–25, 1996.

[3] J. C. Gilbert, C. C. Gonzaga, and E. Karas. Examples of ill-behaved central paths in convex optimization. Mathematical Programming,103(1):63–94,2005.

[4] C. C. Gonzaga. Path following methods for linear pro- gramming. SIAM Review,34(2):167–227,1992.

[5] C. C. Gonzaga and F Bonnans. Fast convergence of the simplified largest step path following algorithm. Mathematical Programming,76:95–116,1997.

[6] C. C. Gonzaga and R Castillo. A nonlinear programming algorithm based on non-coercive penalty functions. Mathematical Programming,96:87–101,2003.

[7] C. C. Gonzaga and E. W. Karas. Fine tuning Nesterov’s steepest descent algorithm for di↵erentiable convex programming. Mathematical Programming,138(1– 2):141–166, 2013.

[8] C. C. Gonzaga, E. W. Karas, and D. R. Rossetto. An op- timal algorithm for constrained di↵erentiable convex optimization. SIAM Journal on Optimization,2013. To appear.

[9] C. C. Gonzaga, E. W. Karas, and M. Vanti. A globally convergent filter method for nonlinear programming. SIAM J. Optimization,14(3):646–669,2003. Volume 5 Number 1 May 2015 9

From elementary linear algebra, we know that the largest eigenvalue of a principal submatrix is From Lov´asz ✓ Function and bounded by the largest eigenvalue of the matrix it- Karmarkar’s Algorithm to self. So, since the submatrix corresponding to the Semidefinite Programming vertices of the clique is a k k matrix of all ones, ⇥ Farid Alizadeh and since the sole nonzero eigenvalue of such ma- School of Business, Rutgers University, trix is equal to k, it follows that k A(x) .  [1] Piscataway, NJ 08854 In particular, the size of the largest clique !(G) [email protected]  minx [1] A(x) . What is amazing is that the right hand side of It is truly an honor, and quite an unexpected one, the inequality, known as the Lov´asz ✓ function,is to be awarded the Farkas Prize, especially consid- the solution of a convex optimization problem, even ering the remarkable people who have received this though there are no “neat” formulas for eigenvalues award in the past. of a matrix. As a result, the ✓ invariant of a graph Below I will give an account of how I was steered can be estimated in polynomial time by the ellipsoid into the study of semidefinite programming from a method. brilliant construction by Lov´asz. 2. Karmarkar’s algorithms 1. The marvelous ✓ function Completely independent of the ✓ function I also While studying as a graduate masters student at got interested in the (then) new Karmarkar’s algo- the University of Nebraska–Lincoln in 1987, I came rithm for linear programming (LP). While the el- across the little book of Lov´asz [27] (see also the lipsoid method was known to solve LP’s in polyno- earlier [26]). In it there was a derivation of a graph mial time, it had not proved to be useful in prac- invariant which was so simple and yet astonishingly tice. Karmarkar’s algorithm was both polynomial- beautiful. Here is the idea: Suppose a graph has a time, and also practical. I read Karmarkar’s pa- clique (as subset of vertices all connected to each per [24], and subsequently attended the SIAM Op- other) of size k. Construct a symmetric matrix timization Conference in Houston in 1987. Many A(x), where the i, j entry is 1 if vertices i and j major players in interior point methods, including are connected in the graph or i = j; otherwise set Ilan Adler, Don Goldfarb, Clovis Gonzaga, Naren- [A(x)] =[A(x)] = x a real-valued free variable. ij ji ij dra Karmarkar, Michael J. Todd and Yinyu Ye were Consider this example: presenting talks. 1

2 3

5 4

The matrix of this graph as described above, and with x = x14,x15,x24,x35 is given by:

111 x14 x15 111x24 1 def 0 1 A(x) = 1111x35 . Bx x 111C Yinyu Ye, Farid Alizadeh and Sanjay Mehrotra B 14 24 C Bx 1 x 11C B 15 35 C @ A 10 INFORMS Optimization Society Newsletter

Right brain meets left brain: Can min c x Ax = b, x 0 ; only the nonnegative or- { > | } we extend Karmarkar’s algorithm thant was replaced by the positive semidefinite cone. Based on this observation I came up with a strategy to compute the ✓ function? to extend design and analysis of interior point meth- ods: Driving back from Houston to Lincoln, and juggling in my mind what I had learned in the SIAM con- 1. First, I would focus my attention not on the ✓ ference, I thought of a crazy idea: Can we apply function per se, but on a more general optimiza- interior point methods to compute the ✓ function? tion problem and its dual: After transferring to Minnesota in 1988 to study with late Ben Rosen, I worked on this idea and re- min C X max b y • > alized that it was not so crazy after all. Applying s.t. A X = b s.t. y A + S = C i • i i i i the logarithmic barrier to the optimization problem X < 0 S < 0 characterizing the ✓ function we get: P (This was the time I thought of the term min z µ ln(z i A(x) semidefinite programming (SDP) for this prob- z,x 1 i lem ). X =minz µ ln z i A(x) z,x 2. Next, I would take one of literally hundreds i Y⇣ ⌘ of papers on the complexity of various interior =minz µ ln PA(x)(z), z,x point methods for LP, and try to extend it, word for word,toSDP. where PA(x)(z) is the characteristic polynomial of A(x). Now, the last line can be easily computed My intuition was based on the belief that Kar- using unconstrained optimization techniques based markar’s, and other interior point methods did not on Newton’s, or quasi/truncated Newton methods. rely in significant ways on the combinatorial nature Using the array of supercomputers available at Min- of linear programming (the way the simplex methods nesota, especially the Cray-2, and the NAG library did, for instance). So techniques for their analysis optimization software, I could use the above formu- should be extensible to more general setting above. lation and compute ✓ of graphs with hundreds of In the meantime, around May of 1990, Yinyu Ye, vertices and a few thousand edges. who was at the University of Iowa at the time, visited Minnesota. We had an hour long conversation and I discussed my project with him. Yinyu was very 3. Onward to semidefinite pro- interested and encouraged me to pursue the project. gramming He also suggested that I use one of the algorithms he had developed, which were based on the Todd- In the Spring of 1990 I got hold of the wonderful Ye-Mizuno potential function. book by Gr¨otschel, Lov´asz and Schrijver [21]. To I spent the next few weeks studying Ye’s two pa- my delight an entire chapter was devoted to the ✓ pers [48, 47] which presented two potential reduc- function. A particular “dual” formulation caught tion algorithms for linear programming with itera- my eye: tion complexity pnL . While reading these pa- O pers, I realized that in almost every place that the ✓(G) = max J Y • vector xk (the current estimate for the optimal solu- s.t. Y = 0 for all ij edges in G , ij tion) occurs as a parameter of a nonlinear function, Y < 0 that function is symmetric in xk. For instance, in

def 1The term “positive (semi)-definite programming” had where A B = AijBij, and Y < 0 means Y is • ij been used much earlier by Dantzig and Cottle to describe a a symmetric positive semidefinite matrix. P di↵erent problem: The linear complementarity problem with Looking at this characterization, I realized that a semidefinite matrix[14]. I was not aware of this at the time, it resembled the so-called “standard-form” LP: and in any case the term had not caught on. Volume 5 Number 1 May 2015 11

the potential function given by c>x q i ln xi,the the nonnegative orthant back to itself, the corre- 1 term ln xi is symmetric. So are expressions like sponding X X X did not map the semidefinite i P ! k x 1 ,where1 is the vector of all ones, and cone to itself; the product of two symmetric matrices k Pk k·k corresponds to either the Euclidean norm, the max- is not even a symmetric matrix in general. imum ( ) norm or the 1-norm. So it was straight- I was stuck on this issue for a few weeks, un- 1 forward to translate statements involving such func- til I decided to pay a visit to Yinyu and seek his tions to SDP. All I had to do was to replace xk advice. In summer of 1990, I drove five hours (in LP) with the eigenvalues of Xk (in SDP). For from Minneapolis to Iowa City. Yinyu was gra- instance, an inequality used by Ye, and earlier by cious to treat me to pizza, and even more gra- Karmarkar, was the following: For 0 < x < 1: cious afterwards to listen to my construction and 1 the dilemma about non-symmetry of Xk X. As x 1 2 ln xi xi n k k soon as I described the problem, and without hes- 2 1 x 1 · i i 1 itation, Yinyu proposed that I use instead the trans- X X k k 1/2 1/2 formation: X X XX . The moment he ! k k This inequality is easily proved using the Taylor ex- uttered these words I knew that the problem was pansion of ln(1 x). Now let X be a symmet- 1/2 1/2 solved. X X XX is an automorphism of ric matrix and let (X) be its eigenvalues. Then ! k k i the semidefinite cone. At the same time it has the 0 <i(X) < 1 for all i is equivalent to 0 X I. 1 same spectrum as X X, so all the identities and And the inequality above, upon replacing (X) for k i inequalities I had already proved for its spectrum x , becomes i remained valid for this transformation as well. X I I should mention here that Yinyu in his earlier ln Det(X) trace(X) n k kF , 2 1 ⇢(X I) visit to Minnesota also brought to my attention a preliminary version of Nesterov and Nemirovski’s where ⇢(A) is the spectral radius of A (the largest monumental work [32] (as an e-book on a floppy, eigenvalue norm max (X) for symmetric matri- in 1990!) I did not quite understand the book, but i | i | ces). fortunately there was a workshop in early summer of Another key ingredient of LP interior point meth- 1990 in Madison, Wisconsin, in which both Yuri and ods was some form of linear or projective transforma- Arkadi did also attend. I talked to them about my tion that mapped the nonnegative orthant back to it- work on the ✓ function, and they re-assured me that self and brought the current interior feasible point xk their self-concordance theory implied that ✓ can be estimated in pn iterations. Based on this dis- to 1. For instance, a linear transformation achieving O this would be x Diag(x ) 1x. Typically a num- cussion I quickly wrote a paper [2] on randomized ! k ber of identities and inequalities are proved for this parallel computation of ✓, and became confident that transformed vector, ultimately resulting in a fixed my “word-for-word” approach had a strong chance reduction in a potential function. The analogous of success. 1 transformation for SDP seemed to be X X X ! k which maps the current interior feasible point Xk to I, the identity matrix. As in the inequality above, 4. Further research on applica- I could extend all the needed identities and inequal- tions of SDP 1 ities involving Diag(xk) x in Ye’s papers to the 1 spectrum of Xk X. As a result I could prove a fixed After writing my dissertation [1] and a few papers reduction to an analogous matrix potential function. based on the research above [3, 4], I spent two years So my strategy of “word-for-word” turning LP in- (1992-1994) as an NSF postdoctoral associate at the terior point algorithms and their analyses to SDP International Computer Science Institute at the Uni- seemed to work. versity of California–Berkeley. I was extremely priv- Of course there was a major flaw in all this which ileged to work with Dick Karp on applications of made the whole thing somewhat absurd: While in combinatorial optimization to molecular biology. I LP the transformation x Diag(x ) 1x mapped should also mention my joyful collaboration with ! k 12 INFORMS Optimization Society Newsletter

Andrew Goldberg in implementing the push-relabel the optimization community. With Stefan Schmi- method for the maximum-flow problem on the CM-2 eta we showed that just about anything said about architecture [6]. the Monteiro-Zhang family can be extended verba- At this time I also got to know Ilan Adler and tim to optimization over symmetric cones [44, 45]. through interactions with him got interested both We also showed that the word-for-word extension of in primal-dual algorithms for SDP and also in sec- Ye’s algorithm in LP can be extended all the way ond order cone programming. My interest in SOCP to symmetric cones [12]. With Yu Xia we extended arose after reading a preliminary version of a paper the Q method—developed earlier with Haeberly and of Nemirovski and Scheinberg [31] which essentially Overton [9]—to symmetric cones as well [13]. made a “word-for-word” extension of Karmarkar’s algorithm from LP to SOCP. In 1994 I moved to RUTCOR at Rutgers Univer- sity. Below I briefly outline my activities in the past 2008-2014: nonnegative and SOS polynomials twenty years: As a result of the work of N. Z. Shor [46], and later Lassere [25] Parrilo [42, 41] and Nesterov [35], it be- came evident that the set of polynomials which are 1994-1998: Primal-Dual algorithms I started expressible as sum-of-squares (SOS) of other poly- my collaboration with Jean-Pierre Haeberly of nomials, is SD-representable. With David Papp, we (then) Fordham, and Michael Overton of studied the notion of sum-of-squares in abstract al- University. After studying the primal-dual method gebras and, building on the work of Nesterov [35] of Helmberg et al. [23] we developed our own and Faybusovich [19] showed SDP-representability method, which computed Newton’s direction for pri- of SOS cones in the most general setting possible mal and dual feasibility and the complementarity re- [40]. With RUTCOR students Gabor Rudolf, Nilay XS+SX lation 2 = µI [11]; this direction was termed Noyan, David Papp, we showed that the complemen- by the community as the AHO direction. We also tarity conditions for the cone of univariate nonneg- developed SDPpack, the first publicly available soft- ative polynomials and its dual, the moment cone, ware package that could solve optimization prob- cannot be expressed by more than four independent lems with any combination of linear, second order bilinear relations (unlike nonnegative orthant, and and semidefinite constraints [8]. Later, in a semi- other symmetric cones where complementarity could nal work, Monteiro and Zhang extended the AHO be expressed by n bilinear relations). In the process method by adding an automorphic scaling of the we introduced the notion of bilinearity rank of cones. semidefinite cone [30]. Renato Monteiro showed Later, Gowda and Tao showed that bilinearity rank that under all such scalings, a “short-step” ap- of a cone is exactly the dimension of the Lie algebra proach yields pn ln ✏ iteration complexity [29]. O | | of its automorphism group [20]. The Monteiro-Zhang family includes the fundamen- tal Nesterov-Todd methods [33, 34] and the Helm- berg et al. method mentioned above. 2003-2011: Shape-constrained statistical esti- mation With my colleague at Rutgers, Jonathan 1995-1997: Degeneracy in SDP With Jean- Eckstein, and former students Gabor Rudolf and Ni- Pierre and Michael, we also studied the notion of lay Noyan, we applied SDP to estimate the (nonneg- (non)degeneracy and strict complementarity in SDP ative) arrival rate of nonhomogeneous Poisson pro- [10]. This work was further expanded by Pataki [43], cesses, and used our approach to estimate arrival and Faybusovich [17]. rate of e-mails [5]. And with David Papp we esti- mated the two dimensional arrival rate of accidents 1998-2003: Jordan algebras Through the work in New Jersey Turnpike as a function of both time of G¨uler [22], Faybusovich [17, 16, 18], and bril- and millage post [39]. Also with David we used sim- liant texts of Faraut and Kor´any [15] and Koecher ilar techniques for shape constrained regression and [28] Euclidean Jordan algebras became accessible to probability density estimation [40]. Volume 5 Number 1 May 2015 13

5. Final word [3] F. Alizadeh. Optimization Over Positive Semi-Definite Cone; Interior-Point Methods and Combinatorial I am fortunate to have been associated with great Applications. In P. Pardalos, editor, Advances mentors, colleagues, and students throughout the in Optimization and Parallel Computing. North- Holland, Amsterdam, 1992. years. Early in my career, my advisor at Nebraska, David Klarner, sparked the love of algorithms in me. [4] F. Alizadeh. Interior point methods in semidefinite pro- And Ben Rosen at Minnesota had the confidence to gramming with applications to combinatorial opti- mization. SIAM J. Optim.,5(1):13–51,1995. agree to support me after just a ten minute conver- sation. [5] F. Alizadeh, J. Eckstein, N. Noyan, and G. Rudolf. Arrival Rate Approximation by Nonnegative Cubic Michael Overton, who had already extensively Splines . Operations Research,56(1):140–156,2008. studied eigenvalue optimization [36, 38, 37], did sub- stantial hand-holding in the early stages of my in- [6] F. Alizadeh and A. V. Goldberg. Implementing the Push-Relabel Method for the Maximum Flow Prob- vestigation. I also had stimulating discussions both lem on a Connection Machine. In David S. John- in person and by e-mail with Arkadi Nemirovski and son and Catherine C. McGeoch, editors, Network Yuri Nesterov (who read the first drafts of my disser- Flows and Matching, Fisrt DIMACS Implementa- tation and pointed out several sloppy errors). Laci tion Challenge,volume12ofDIMACS Series in Dis- Lov´asz was gracious and encouraged me to pursue crete Mathematics and Theoretical Computer Sci- ence. American Mathematical Society, 1993. my ideas about the ✓ function and its ecient com- putation. Gene Golub was kind in allowing me to [7] F. Alizadeh and D. Goldfarb. Second-Order Cone Pro- use Stanford student oces, and also having many gramming. Math. Programming Series B,95:3–51, 2003. discussions about numerical stability of eigenvalue optimization algorithms. [8] F. Alizadeh, J.-P.A. Haeberly, V. Nayakkankuppam, M.L. Overton, and S.A. Schmieta. SDPpack User Over the years discussions with Stephen Boyd, Guide (Version 0.9 Beta). Technical Report 737, Gabor Pataki, Mutakuri Ramanna, Michel Goe- Courant Institute of Mathematical Sciences, New mans, David Williamson and Garud Iyengar were York University, New York, NY, 1997. (Available at very fruitful. In addition, I had a great sabbatical http://www.cs.nyu.edu/faculty/overton/sdppack). leave at Columbia IEOR department during calen- [9] F. Alizadeh, J.P. Haeberly, and M.L. Overton. A new dar year of 2001. A result of this visit was a very primal-dual interior point method for semidefinite satisfying collaboration with Don Goldfarb which re- programming. In Proc. 5th SIAM Conference on sulted in our paper on SOCP [7]. Appl. Linear Algebra, Snowbird, Utah, 1994. Finally, I should acknowledge the great pleasure [10] F. Alizadeh, J.P. Haeberly, and M.L. Overton. Com- I have had to work with (former) graduate students plementarity and nondegeneracy in semidefinite pro- gramming. Math. Programming,77(2),1997. Reuben Sattergren, Stefan Schmieta, Yu Xia, Ga- bor Rudolf, Nilay Noyan, David Papp, and RUT- [11] F. Alizadeh, J.P. Haeberly, and M.L. Overton. Primal- COR and Rutgers MSIS (current) graduate students dual interior-point methods for semidefinite pro- gramming: Convergence rates, stability and numer- Marta Cavaleiro, Mohammad Ranjbar and Deniz ical results. SIAM J. Optim.,8(3):746–768,1998. Seyed Eskandani. [12] F. Alizadeh and S.H. Schmieta. Symmetric Cones, Potential Reduction Methods and Word-By-Word REFERENCES Extensions. In R. Saigal, L. Vandenberghe, and H. Wolkowicz, editors, Handbook of Semidefinite Programming, Theory, Algorithms and Applications, pages 195–233. Kluwer Academic Publishers, 2000. [1] F. Alizadeh. Combinatorial Optimization with Interior Point Methods and Semi-Definite Matrices.PhD [13] F. Alizadeh and Y. Xia. The Q Method for Symmetric thesis, Computer Science Department, University of Cone Programming. Journal of Optimzation Theo- Minnesota, Minneapolis, Minnesota, 1991. rey and Applications,149(1):102–137,2010.

[2] F. Alizadeh. A sublinear-time randomized parallel al- [14] G. B. Danztig and R. W. Cottle. Positive (semi-)definite gorithm for the maximum clique problem in perfect programming. In J. Abadie, editor, Nonlinear Pro- graphs. In Proc. 2nd ACM-SIAM Symposium on gramming, pages 55–73. North Holland, Amsterdam, Discrete Algorithms,1991. 1967. 14 INFORMS Optimization Society Newsletter

[15] J. Faraut and A. Kor´anyi. Analysis on Symmetric [31] A. Nemirovski and K. Scheinberg. Extension of Kar- Cones. Oxford University Press, Oxford, UK, 1994. markar’s algorithm onto convex quadratically con- strained quadratic programming. Math. Program- [16] L. Faybusovich. Euclidean Jordan algebras and interior- ming,72:273–289,1996. point algorithms. Positivity,1(4):331–357,1997. [32] Y. Nesterov and A. Nemirovski. Self-concordant func- [17] L. Faybusovich. Linear systems in Jordan algebras and tions and polynomial time methods in convex pro- primal-dual interior point algorithms. Journal of gramming. Central Economical and Mathemati- Computational and ,86:149– cal Institute, U.S.S.R Academy of Science, Moscow, 175, 1997. 1990. [18] L. Faybusovich. A Jordan-algebraic approach to [33] Y.E. Nesterov and M.J. Todd. Self-scaled barriers potential-reduction algorithms. Math. Z.,239:117– and interior-point methods for convex programming. 129, 2002. Math. of Oper. Res.,22:1–42,1997. [19] L. Faybusovich. On nesterov’s approach to semi-infinite programming. Acta. Appl. Math.,74(2):195–215, [34] Y.E. Nesterov and M.J. Todd. Primal-dual interior- 2002. point methods for self-scaled cones. SIAM J. Op- tim.,8:324–364,1998. [20] S. Gowda and J. Tao. On the bilinearity rank of a proper cone and lyapunov-like transformations. Math. Pro- [35] Yu. Nesterov. Squared functional systems and optimiza- gramming Series A,147:155–170,2014. tion problems. In H. Frenk, K. Roos, T. Terlaky, and S. Zhang, editors, High Performance Optimization, [21] M. Gr¨otschel, L. Lov´asz, and A. Schrijver. Geo- Appl. Optim., pages 405–440. Kluwer, Dordrecht, metric Algorithms and Combinatorial Optimization. Netherlands, 2000. Springer Verlag, 1988. [36] M. L. Overton. On minimizing the maximum eigenvalue [22] O. G¨uler. Barrier Functions in Interior Point Methods. of a symmetric matrix. SIAM J. Matrix Anal. Appl., Math. of Oper. Res.,21:860–885,1996. 9(2):256–268, 1988. [23] C. Helmberg, F. Rendl, R.J. Vanderbei, and H. Wolkow- [37] M. L. Overton and R. S. Womersley. On the sum of the icz. An interior-point method for semidefinite pro- largest eigenvalues of a symmetric matrix. SIAM J. gramming. SIAM J. Optim.,6:342–361,1996. Matrix Anal. Appl.,13:41–45,1992. [24] N. Karmarkar. A new polynomial-time algorithm for lin- ear programming. Combinatorica,4:373–395,1984. [38] M. L. Overton and R. S. Womersley. Optimality con- ditions and duality theory for minimizing sums of [25] Lasserre, J. B. Global Optimization with Polynomials the largest eigenvalues of symmetric matrices. Math. and The Problem of Moments. SIAM J. Optim., Programming,62(2):321–357,1993. 11(8):708–817, 2001. [39] D. Papp and F. Alizadeh. Multivariate Arrival Rate [26] L. Lov´asz. On the shannon capacity of a graph. IEEE Estimation by Sum-Of-Squares Polynomial Splines Trans. Inform. Theory, 25(1):1–7, January 1979. and Semidefinite Programming. In Proc. 2011 Win- ter Simulation Conf.,2011. [27] L. Lov´asz. An Algorithmic Theory of Numbers, Graphs and Convexity.CBMS-NSF50.SIAM,1986. [40] D. Papp and F. Alizadeh. Shape-Constrained Estima- tion Using Nonnegative Splines. Comput. Graph. [28] M. Koecher. The Minnesota Notes on Jordan Alge- Statist.,23(1):211–231,2014. bras and Their Applications. Springer-Verlag, 1999. Edited by Kreig, Aloys and Walcher, Sebastian [41] P. A. Parrilo. Semidefinite programming relaxations based on Lectures given in The University of Min- for semialgebraic problems. Mathematical Program- nesota, 1960. ming,96(2):293–320,2003. [29] R.D.C. Monteiro. Polynomial convergence of primal- dual algorithms for semidefinite programming based [42] Parrilo, P. A. Structured Semidefinite Programs and on monteiro and zhang family of directions. SIAM semialgebraic Methods. PhD thesis, The Californai J. Optim.,8:797–812,1998. Inst. of Technology, 2000.

[30] R.D.C. Monteiro and Y. Zhang. A unified analysis for a [43] G. Pataki. On the rank of extreme matrices in semidef- class of path-following primal-dual interior-point al- inite programming and the multiplicity of optimal gorithms for semidefinite programming. Mathemat- eigenvalues. Math. of Oper. Res.,23(2):339–358, ical Programming,81:281–299,1998. 1998. Volume 5 Number 1 May 2015 15

[44] S.H. Schmieta and F. Alizadeh. Associative and Jordan Algebras, and Polynomial Time Interior-Point Algo- rithms for Symmetric Cones. Math. of Oper. Res., Local Versus Global 26(3):543–564, 2001. Conditions in Polynomial [45] S.H. Schmieta and F. Alizadeh. Extension of Commuta- Optimization tive Class of Primal-Dual Interior Point Algorithms to Symmetric Cones. Math. Programming,96:409– Jiawang Nie 438, 2003. Department of Mathematics University of California San Diego, [46] N. Z. Shor. A class of global minimum bounds of poly- La Jolla, CA 92093, USA nomial functions. Cybernatics,23(6):731–734,1987. [email protected] [47] Y. Ye. A class of projective transformations for lin- ear programming. SIAM J. Comput.,19(3):457–466, 1990. This paper briefly reviews the relationship be- tween local and global optimality conditions in [15]. [48] Y. Ye. An O(n3L) Potential Reduction Algorithm for Linear Programming. Math. Programming, Consider the polynomial optimization problem 50(2):239–258, 1991. min f(x) s.t. h (x)=0(i =1,...,m ), (1) 8 i 1 g (x) 0(j =1,...,m ), < j 2

where f,h:1,...,hm1 ,g1,...,gm2 are real polynomi- als in x := (x1,...,xn). For convenience, denote

h := (h1,...,hm1 ),g:= (g1,...,gm2 )

and g0 := 1. Let K be the feasible set of (1). When there are no equality (resp., inequality) constraints, the tuple h = and m =0(resp.,g = and ; 1 ; m2 = 0). The problem (1) can be treated as a general non- linear program. By classical nonlinear optimiza- tion methods, we can typically get a Karush-Kuhn- Tucker (KKT) point of (1). Theoretically, it is NP- hard to check whether a KKT point is a local min- imizer or not. However, it is often not too hard to

Sanjay Mehrotra, Jiawang Nie and Andrzej Ruszczy´nski 16 INFORMS Optimization Society Newsletter do that in practice. This is because there exist stan- Clearly, (2) implies the gradient L(u) = 0. The rx dard conditions ensuring local optimality. On the polynomials f,hi,gj are smooth functions. Thus, other hand, it is often much harder to get a global under the constraint qualification condition, the sec- minimizer. In practice, sometimes we may be able to ond order necessity condition (SONC) holds: get a global optimizer, but it is typically hard to ver- T 2 v L(u)v 0 v G(u)?. (5) ify the global optimality. A major reason for this is rx 8 2 lack of easily checkable global optimality conditions In the above, G(u) denotes the Jacobian of the active in nonlinear programming theory. constraining polynomials Local and global optimality conditions are pre- sumably very di↵erent, except special cases like con- G(u) = Jacobian h1,...,hm1 ,gj1 ,...,gjr vex optimization. For general nonconvex optimiza- x=u ⇣ ⌘ tion, little is known about global conditions. How- ever, for polynomial optimization, this is possible and G(u)? denotes the null space of G(u). If it holds by using representations of nonnegative polynomi- that als. Interestingly, global optimality conditions are T 2 v L(u)v>0 for all 0 = v G(u)?, (6) closely related to the local ones, which was discov- rx 6 2 ered in the paper [15]. then the second order suciency condition (SOSC) holds at u. The relations among the above condi- 1. Local Optimality Conditions tions can be summarized as follows: if CQC holds at u, then (2), (3) and (5) are necessary conditions Let u be a local minimizer of (1) and for u to be a local minimizer, but they may not be sucient; if (2), (3), (4) and (6) hold at u K,then J(u):= j ,...,j 2 { 1 r} u is a strict local minimizer of (1). We refer to [1, be the index set of active inequality constraints. If Section 3.3] for such classical results. the constraint qualification condition (CQC) holds Mathematically, CQC, SCC and SOSC are su- at u, i.e., the gradient vectors cient for local optimality, but may not be necessary. However, for generic cases, they are sucient and h (u),..., h (u), g (u),..., g (u) r 1 r m1 r m1 r jr necessary conditions. This is a major conclusion of are linearly independent, then there exist Lagrange [15]. Denote by R[x]d the set of real polynomials in x and with degrees at most d. Let [m]:= 1,...,m . multipliers 1,...,m1 and µ1,...,µm2 satisfying { } The following theorem is from [15]. m1 m2 f(u)= i hi(u)+ µj gj(u), (2) Theorem 1. Let d ,d ,...,d ,d ,...,d be posi- r r r 0 1 m1 10 m0 2 Xi=1 Xj=1 tive integers. Then there exist a finite set of nonzero µ g (u)= = µ g (u)=0, polynomials '1,...,'L, which are in the coecients 1 1 m2 m2 (3) ··· of polynomials f R[x]d0 , hi R[x]di (i [m1]), µ1 0,...,µm2 0. 2 2 2 gj R[x]d (j [m2]) such that if The equation (2) is called the first order optimality 2 j0 2 condition (FOOC), and (3) is called the complemen- '1(f,h1,...,hm1 ,g1,...,gm2 ) =0, tarity condition. If it further holds that . 6 . µ1 + g1(u) > 0,...,µm2 + gm2 (u) > 0, (4) ' (f,h ,...,h ,g ,...,g ) =0, L 1 m1 1 m2 6 then the strict complementarity condition (SCC) then CQC, SCC and SOSC hold at every local min- holds at u. The strict complementarity is equiva- imizer of (1). lent to µ > 0 for every j J(u). Let L(x)bethe j 2 Lagrange function Theorem 1 implies that the local conditions CQC,

m1 SCC and SOSC hold at every local minimizer in the L(x):=f(x) h (x) µ g (x). space of input polynomials with given degrees, ex- i i j j i=1 j J(u) cept a union of finitely many hypersurfaces. So, X 2X Volume 5 Number 1 May 2015 17 they hold in an open dense set in the space of in- If there exists a polynomial p h + Q(g)such 2hi put polynomials. Therefore, CQC, SCC and SOSC that the set x Rn : p(x) 0 is compact, { 2 } can be used as sucient and necessary conditions in then h + Q(g) is said to be archimedean.The h i checking local optimality, for almost all polynomial archimedeanness of h + Q(g) implies the compact- h i optimization problems. This fact was observed in ness of K, while the reverse is not necessary. How- nonlinear programming. ever, when K is compact, we can always add a re- dundant condition like R x 2 0tothetupleg so k k2 that h +Q(g) is archimedean. Hence, archimedean- 2. A global optimality condition h i ness of h + Q(g) is almost equivalent to the com- h i Let u be a feasible point for (1). By the definition, pactness of K. Putinar’s Positivstellensatz [16] says that if h + Q(g) is archimedean, then every poly- u is a global minimizer if and only if h i nomial which is strictly positive on K belongs to f(x) f(u) 0 x K. (7) h + Q(g) (cf. [16]). 8 2 h i Typically, it is quite dicult to check (7) directly. The global optimality condition (8) is equivalent In practice, people are interested in easily checkable to the membership conditions ensuring (7). For polynomial optimiza- f(x) f(u) h + Q(g). tion, this is possible by using sum-of-squares type 2h i representations. When u is a global minimizer of (1), the polynomial Let R[x] be the ring of real polynomials in x := ˜ (x1,...,xn). A polynomial p R[x] is said to f(x):=f(x) f(u) 2 be sum-of-squares (SOS) if p = p2 + + p2 for 1 ··· k is nonnegative on K, but not strictly positive on p1,...,pk R[x]. A sucient condition for (7) is 2 K. This is because u is always a zero point of f˜ that there exist polynomials 1,...,m1 R[x] and 2 on K. So, Putinar’s Positivstellensatz itself does SOS polynomials 0,1,...,m2 R[x] such that 2 not imply the global optimality condition (8). In- m1 m2 deed, there are counterexamples that (8) may not f(x) f(u)= (x)h (x)+ (x)g (x). (8) i i j j hold. For instance, when f is the Motzkin polyno- Xi=1 Xj=0 mial x2x2(x2 + x2 3x2)+x6 and K is the unit ball, 1 2 1 2 3 3 The equality in (8) is a polynomial identity in the then (8) fails to hold. variables of x. Note that for every feasible point x However, the global optimality condition (8) holds in (1), the right hand side in (8) is always nonneg- for almost all polynomials f,hi,gj, i.e., it holds in ative. This is why (8) ensures that u is a global an open dense set in the space of input polynomials. minimizer. The condition (8) was investigated by This is a major conclusion of [15]. The ideal h is h i Lasserre [6]. It was a major tool for solving the op- said to be real if every polynomial in R[x] vanish- timization problem (1) globally. We call (8) a global ing on the set x Rn : h(x)=0 belongs to h { 2 } h i optimality condition for (1). (cf. [2]). This is a general condition. For instance, People wonder when the global optimality condi- if h is a prime ideal and h has a nonsingular real h i tion holds. The representation of f(x) f(u) in (8) zero, then h is real (cf. [2]). As pointed out earlier, h i was motivated by Putinar’s Positivstellensatz [16], when the feasible set K is compact, we can gener- which gives SOS type certificates for positive or non- ally assume that h + Q(g) is archimedean. Inter- h i negative polynomials on the set K. Denote estingly, the local conditions CQC, SCC and SOSC imply the global optimality condition (8), under the h := h1R[x]+ + hm1 R[x], h i ··· archimedeanness of h + Q(g). The following theo- h i which is the ideal generated by the polynomial tu- rem is a consequence of the results in [15]. ple h. Let ⌃[x] be the set of all SOS polynomi- Theorem 2. Assume that the ideal h is real and als in R[x]. The polynomial tuple g generates the h i the set h + Q(g) is archimedean. If the constraint quadratic module: h i qualification condition, strict complementarity con- Q(g):=⌃[x]+g ⌃[x]+ + g ⌃[x]. dition, and second order suciency condition hold 1 ··· m2 18 INFORMS Optimization Society Newsletter at every global minimizer of (1), then the global op- for k =1, 2,...,. The problem (9) is equivalent to timality condition (8) holds. a semidefinite program (SDP). So it can be solved as an SDP by numerical methods. For instance, the Proof. At every global minimizer u of f on K, software GloptiPoly 3 [3] and SeDuMi [18] can be the CQC, SCC and SOSC conditions implies that used to solve it. We refer to [7, 9, 10] for recent work the boundary hessian condition holds at u,by in polynomial optimization. Theroem 3.1 of [15]. The boundary hessian con- Let f be the minimum value of (1) and f de- dition was introduced by Marshall [11] (see Condi- min k note the optimal value of (9). It was shown that tion 2.3 of [15]). Let fmin be the global minimum (cf. [6]) value of (1). Denote V = x Rn : h(x)=0 . Let { 2 } I(V ) be the set of all polynomials vanishing on V . fk fk+1 fmin. By Theorem 9.5.3 of [10] (also see Theorem 2.4 of ···  ··· [15]), we have When h +Q(g) is archimedean, Lasserre [6] proved h i f(x) f I(V )+Q(g). the asymptotic convergence min 2 Because h is real, h = I(V ) and f f as k . h i h i k ! min !1 f(x) f(u) h + Q(g). 2h i If fk = fmin for some k, Lasserre’s hierarchy is said So, the global optimality condition (8) holds. to have finite convergence. It is possible that the sequence f has only asymptotic, but not finite, { k} By Theorem 1, the local conditions CQC, SCC convergence. For instance, this is the case when f and SOSC hold generically, i.e., in an open dense set is the Motzkin polynomial x2x2(x2 + x2 3x2)+x6 1 2 1 2 3 3 in the space of input polynomials. Therefore, the and K is the unit ball [12, Example 5.3]. Indeed, global optimality condition (8) also holds generically, such f always exists whenever dim(K) 3, which when h is real and h + Q(g) is archimedean. h i h i can be implied by [17, Prop. 6.1]. However, such cases do not happen very much. Lasserre’s hierarchy 3. Lasserre’s hierarchy often has finite convergence in practice, which was demonstrated by extensive numerical experiments in Lasserre [6] introduced a sequence of semidefinite polynomial optimization (cf. [4, 5]). relaxations for solving (1) globally, which is now A major conclusion of [15] is that Lasserre’s hier- called Lasserre’s hierarchy in the literature. It can archy almost always has finite convergence. Specif- be desribed in two equivalent versions. One version ically, it was shown that Lasserre’s hierarchy has uses SOS type representations, while the other one finite convergence when the local conditions CQC, uses moment and localizing matrices. They are dual SCC and SOSC are satisfied, under the archimedean- to each other, as shown in [6]. For convenience of ness. The following theorem is shown in [15]. description, we present the SOS version here. For Theorem 3. Assume that h + Q(g) is each k N (the set of nonnegative integers), denote h i 2 archimedean. If the constraint qualification, the sets of polynomials (note g = 1) 0 strict complementarity and second order su- m1 each i R[x] ciency conditions hold at every global minimizer h 2k := ihi 2 , of (1), then Lasserre’s hierarchy of (9) has finite h i ( and deg(ihi) 2k ) i=1  X convergence. m2 each j ⌃[x] By Theorem 1, the local conditions CQC, SCC Qk(g):= jgj 2 . 8 and deg(jgj) 2k 9 and SOSC at every local minimizer, in an open dense < j=0  = X set in the space of input polynomials. This im- Note that h is a truncation of h and Q (g)is h :i2k h i k; plies that, under the archimedeanness of h + Q(g), a truncation of Q(g). The SOS version of Lasserre’s h i Lasserre’s hierarchy has finite convergence, in an hierarchy is the sequence of relaxations open dense set in the space of input polynomials. max s.t. f h + Q (g) (9) That is, Lasserre’s hierarchies almost always (i.e., 2h i2k k Volume 5 Number 1 May 2015 19 generically) have finite convergence. This is a major [6] J.B. Lasserre. Global optimization with polynomials and conclusion of [15]. the problem of moments. SIAM J. Optim.,11(3): If one of the assumptions in Theorem 3 does not 796-817, 2001. hold, then f may fail to have finite convergence. [7] J.B. Lasserre. Moments, Positive Polynomials and Their { k} The counterexamples were shown in 3 of [15]. On Applications, Imperial College Press, 2009. § the other hand, there exists other non-generic con- [8] M. Laurent. Semidefinite representations for finite vari- ditions than ensures finite convergence of f . For { k} eties. Mathematical Programming,Vol.109,pp.1– instance, if h has finitely many real or complex zeros, 26, 2007. then f has finite convergence (cf. [8, 14]). { k} [9] M. Laurent. Sums of squares, moment matrices and op- Since the minimum value fmin is typically not timization over polynomials. Emerging Applications known, a practical concern is how to check fk = fmin of Algebraic Geometry, Vol. 149 of IMA Volumes in in computation. This issue was addressed in [13]. Mathematics and its Applications (Eds. M. Putinar Flat truncation is generally a sucient and neces- and S. Sullivant), Springer, pages 157-270, 2009. sary condition for checking finite convergence. [10] M. Marshall. Positive Polynomials and Sums of Squares. For non-generic polynomial optimization prob- Mathematical Surveys and Monographs, 146. Amer- lems, it is possible that the sequence f does not ican Mathematical Society, Providence, RI, 2008. { k} have finite convergence to fmin. People are inter- [11] M. Marshall. Representation of non-negative polynomi- ested in methods that have finite convergence for als, degree bounds and applications to optimization. minimizing all polynomials over a given set K.The Canad. J. Math.,61(2009),pp.205–221. Jacobian SDP relaxation proposed in [12] can be ap- [12] J. Nie. An exact Jacobian SDP relaxation for polyno- plied for this purpose. It gives a sequence of lower mial optimization. Mathematical Programming, Ser. bounds that have finite converge to fmin, for every A, Vol. 137, pp. 225–255, 2013. polynomial f that has a global minimizer over a gen- [13] J. Nie. Certifying convergence of Lasserre’s hierarchy via eral set K. flat truncation. Mathematical Programming, Ser. A, Vol 142, No. 1-2, pp. 485-510, 2013. Acknowledgement The research was partially sup- ported by the NSF grants DMS-0844775 and DMS- [14] J. Nie. Polynomial optimization with real varieties. SIAM Journal On Optimization,Vol23,No.3,pp. 1417985. 1634-1646, 2013.

[15] J. Nie. Optimality Conditions and Finite Convergence REFERENCES of Lasserre’s Hierarchy Mathematical Programming, Ser. A, Vol 146, No. 1-2, pp. 97-121, 2014.

[1] D. Bertsekas. Nonlinear Programming,secondedition. [16] M. Putinar. Positive polynomials on compact semi- Athena Scientific, 1995. algebraic sets, Ind. Univ. Math. J. 42 (1993), 969– 984. [2] J. Bochnak, M. Coste and M-F. Roy. Real Algebraic Ge- ometry, Springer, 1998. [17] C. Scheiderer. Sums of squares of regular functions on real algebraic varieties. Trans. Am. Math. Soc.,352, [3] D. Henrion, J. Lasserre and J. Loefberg. Glop- 1039-1069 (1999). tiPoly 3: moments, optimization and semidef- inite programming. http://homepages.laas.fr/ [18] J.F. Sturm. SeDuMi 1.02: a MATLAB toolbox for opti- henrion/software/gloptipoly3/ mization over symmetric cones. Optimization Meth- ods and Software,11&12(1999),625-653.http: [4] D. Henrion and J.B. Lasserre. GloptiPoly : Global Op- //sedumi.ie.lehigh.edu timization over Polynomials with Matlab and Se- DuMi. ACM Trans. Math. Soft. 29, pp. 165–194, 2003.

[5] D. Henrion and J.B. Lasserre. Detecting global opti- mality and extracting solutions in GloptiPoly. Pos- itive polynomials in control, 293310, Lecture Notes in Control and Inform. Sci., 312, Springer, Berlin, 2005. 20 INFORMS Optimization Society Newsletter

variety of decomposable problems [6, 14, 8, 9, 25, 5]. Although splitting methods are observed to per- Convergence Rate Analysis form well in practice, their convergence rates were of Several Splitting Schemes unknown for many years; this is a big limita- Damek Davis tion for practitioners. Without convergence rates, Department of Mathematics practitioners cannot predict practical performance, University of California, Los Angeles, CA compare competing methods without implementing [email protected] both, or understand why methods perform well on nice problems but are slow on other problems of the same size. The award winning manuscript [12], and Our paper [12] is concerned with the following un- follow up work [13, 10, 11] resolves these issues for a constrained minimization problem: variety of splitting algorithms. minimize f(x)+g(x) (1) Analyzing convergence rates of splitting methods x 2H is challenging because the classical objective error where is a Hilbert space and f,g : ( , ] analysis breaks down. Here, the objective no longer H H! 1 1 are closed, proper, and convex functions. Through monotonically decreases; in some cases, they can be duality, the results generalize to the problem: completely useless. For example, finding a point in the intersection of sets C1 and C2 can be formulated minimize f(x)+g(y)subjecttoAx + By =0, as Problem (1) with f = ◆C1 and g = ◆C2 . Notice x 1,y 2 2H 2H (2) that the objective function f + g is finite (and equal to 0) only at a solution x C C . The Douglas- 2 1 \ 2 where A : and B : are linear Rachford splitting method (which is a special case of H1 !G H2 !G mappings and , , and are Hilbert spaces. At the Peaceman-Rachford splitting method (see Equa- H1 H2 G first this decomposition of the objective into two tion 3)) produces a sequence of points converging functions may seem superfluous, but it captures the to a solution. In general, all points in the sequence motivation of operator-splitting methods: the whole have an infinite objective value, but the limit point convex problem may be as complicated as you wish, has a zero objective value! Thus, we must take a dif- but it can be decomposed into two or more (possibly) ferent approach to analyze operator-splitting meth- simpler terms. ods: we first prove convergence rates for the fixed- The above decomposition suggests that we try to point residual associated to a nonexpansive map- minimize the complicated objective f +g by alterna- ping, which implies bounds for certain subgradients tively solving a series of subproblems related to min- that we use to bound the objective function. imizing f and g separately. The following strongly convex optimization problem is one of the main sub- problems we use:

1 2 proxf (x) := argminy f(y)+ y x , 2H 2k k where the input x will encode some information about f + g, so the subproblem will not just min- imize f alone. It turns out that many functions arising in machine learning, signal processing, and imaging have simple or closed form proximal oper- ators [7, 20, 4, 16], which is why operator-splitting algorithms are so e↵ective for large-scale problems. Since von Neumann devised the method of al- ternating projections [24], several operator splitting Sanjay Mehrotra, Damek Davis and Serhat Aybat methods have been proposed [18, 21, 23] to solve a Volume 5 Number 1 May 2015 21

A sketch of the upper complexity proof. z0 A key property used in our analysis is the firm non- expansiveness of prox , which means that 2prox g(xg) f(xf ) f f r r I is nonexpansive, i.e., 1-Lipschitz continuous [1, H Proposition 4.2]. Because the composition of two e e nonexpansive operators remains nonexpansive, this xg = proxg(z) xf = proxf (z0) property implies that the Peaceman-Rachford op- erator T := (2prox I) (2prox I)is g(xg) f(xf ) PRS f g r r nonexpansive [18]. Thus, the Krasnosel’ski˘ı-Mann e xf xg e (KM) [17, 19] iteration with damping by (0, 1) 2 z + z TPRS(z) zk+1 =(1 )zk + T (zk) k =0, 1,... (3) PRS Figure 1: A single iteration of the relaxed PRS al- will weakly converge to a fixed point of TPRS when- + gorithm. Note that z =(1/2)z +(1/2)TPRS(z) and ever one exists [1, Theorem 5.14]. We are interested g(xg) @g(xg) and f(xf ) @f(xf ). in fixed-points z⇤ of TPRS because they immediately r 2 r 2 yield minimizers through the relation prox (z ) g ⇤ 2 e e argmin(f + g) [1, Theorem 25.6]. Furthermore, the A few identities immediately follow from Figure 1: j sequence (proxg(z ))j 0 weakly converges to a min- + z z = x x = ( g(x )+ f(x )) (4) imizer of f + g when one exists [22]. f g r g r f For any nonexpansive mapping T : ,the H!H Note that whenever TPRS(ez)=z,wehavee x⇤ := fixed-point residual (FPR) Tz z measures how x = x and g(x )+ f(x ) = 0, so x = prox (z) k k f g r ⇤ r ⇤ ⇤ g “close” z is to being a fixed point. A key result in is optimal for Problem 1. With this identity and our paper establishes the convergence rate of this Theorem 4, wee get thee following result: quantity: j Theorem 6. Suppose that (z )j 0 is generated by Theorem 4. Let (0, 1). For any nonexpansive 2 Algorithm 3. For all k 0,withxk, g(xk),xk , and map T : with nonempty fixed-point set, we g r g f H!H f(xk ) defined as in Figure 3, we have have r f 1 e Tzk zk = o k k pk +1 e k k 2 k k 2 1 ✓ ◆ xf xg = g(xg )+ f(xf ) = o . k k kr r k k +1 where z0 and for all k 0, zk+1 =(1 )zk + ✓ ◆ 2H T zk. By applying thee basic inequalitye h(y) h(x)+ h(x),y x on f and g, we obtain the next theo- Roughly, this theorem follows by showing ( zj+1 hr i j 2 k rem. z )j 0 is monotonic and summable. The conver- e k j gence then follows from the following simple lemma: Theorem 7. Suppose that (z )j 0 is generated by Algorithm (3). Let x⇤ be a minimizer of f + g.For Lemma 5. Let (aj)j 0 R++ be a monotonic down k k ✓ all k 0, let x and x be defined as in Figure 3. and summable sequence. Then a = o(1/(k + 1)). g f k Then k !1 Proof. (k + 1)a2k a2k + + ak 0. k k 1  ··· ! f(x )+g(x ) f(x⇤) g(x⇤) = o . | f g | pk +1 We already mentioned that the convergence rate of ✓ ◆ the fixed-point residual would play a role in bound- Furthermore, when f is Lipschitz continuous, we ing subgradients associated to the relaxed PRS al- have gorithm. Using some basic convex analysis, Figure 1 k 1 “expands” a single iteration of Algorithm 3 into a 0 (f + g)(x ) (f + g)(x⇤)=o .  g pk +1 series of “subgradient steps.” ✓ ◆ 22 INFORMS Optimization Society Newsletter

Theorem 7 shows that the relaxed PRS algorithm Furthermore, when f is Lipschitz continuous, we is strictly faster than the slowest performing first- have order (subgradient) methods. Splitting methods are k 1 known to perform well in practice, so it is natural 0 (f + g)(x ) (f + g)(x⇤)=O .  g k +1 to suspect that this is not the best we can do. The ✓ ◆ following section shows that this intuition is partially The rate O(1/(k + 1)) is a big improvement over true. the o(1/pk + 1) rate from Theorem 7. However, it is not clear that averaging will always result in faster Acceleration through averaging. practical performance. Indeed, consider the follow- ing basis pursuit problem: Can we improve upon the worst-case o(1/pk + 1) complexity in Theorem 7? The following example minimize x 1 subject to: Ax = b (5) gives us some insight into how we might improve. x Rd k k 2 where A Rm d is a matrix and b Rm is a vec- 2 ⇥ 2 tor. We can apply the relaxed PRS algorithm to this problem by setting g(x)= x and f(x)= k k1 ◆ y Ay=b (x)where◆C denotes the 0, -valued in- { | } { 1} dicator function of a set C.

Figure 2: The relaxed PRS algorithm ( =1/2) ap- plied to the intersection of two lines L1 and L2 (the horizontal axis and the line through the origin with slope .2). Formally, g = ◆L1 and f = ◆L2 are the in- dicator functions of L1 and L2.Theleftside(red)is the actual trajectory of the DRS algorithm (i.e., the j sequence (z )j 0). The right side (blue) is obtained k k i through averaging z := (1/(k + 1)) i=0 z . P Figure 2 replaces the k-th iterate zk generated by Algorithm (3) with the averaged, or ergodic,iter- k k i ate z := (1/(k + 1)) i=0 z . It is clear that, at least initially, the ergodic iterate outperforms the Figure 3: The objective error of the basis pursuit P standard, or nonergodic, iterate because of cancella- problem (5) tion induced from averaging the spiraling sequence. In actuality, the nonergodic iterate eventually sur- Figure 3 shows a numerical experiment on the passes the nonergodic iterate and converges linearly basis-pursuit problem. Three things are clear: (i) with rate 0.9806 [2], but this example gives us an the nonergodic iterate is highly oscillatory as sug- ⇡ idea of why averaging can help. gested early; (ii) the performance of the nonergodic Inspired by Figure 2, we introduced the following iterate eventually surpasses the ergodic iterate; (iii) averaging scheme for relaxed-PRS to obtain a faster the ergodic iterate eventually converges with rate ex- O(1/(k + 1)) worst-case convergence rate. actly O(1/(k + 1)) and no faster. This situation is j typical, and suggests that averaging is not necessar- Theorem 8. Suppose that (z )j 0 is generated by ily a good idea in all scenarios. Nevertheless obtain- Algorithm (3). Let x be a minimizer of f + g.For ⇤ ing xk is a nearly costless procedure, so it never hurts all k 0, let xk and xk be defined as in Figure 3. g g f to average. k k i k Furthermore, let xg := (1/(k+1)) xg and x := k i i=0 f (1/(k + 1)) i=0 xf . Then P P k k 1 f(x )+g(x ) f(x⇤) g(x⇤) = O . | f g | k +1 ✓ ◆ Volume 5 Number 1 May 2015 23

✓1 ✓2 ··· ··· L L L L

Figure 4: This graph shows 5 iterations of relaxed PRS applied to an infinite direct sum of examples similar to the one shown in Figure 2. The angles (✓j)j 0 are converging to 0. The black and blue dots coincide in the first component. The infinite vectors of black (on tip of spiral), blue (on horizontal axis), and magenta 5 5 5 (on non horizontal line) dots are the vectors z , xg, and the projection of xg onto the non horizontal line, respectively. The objective functions in Theorem 9 are as follows: g is the indicator of the horizontal axis 5 5 and f is the distance function to the non horizontal line. Thus, g(xg) = 0 and f(xg) is the distance between the blue and magenta dots.

The worst-case nonergodic rate is sharp. satisfies

Although averaging can improve the rate of conver- i 1 min (f + g)(xg) (f + g)(x⇤) = o i=0, ,k k +1 gence of relaxed PRS, we observed in Figure 3 that ··· ✓ ◆ the nonergodic iterate can perform better than the whenever f is di↵erentiable and f is Lipschitz con- r ergodic iterate. Does this mean that our conver- tinuous. Thus, relaxed PRS obtains the same rate of gence rate in Theorem 7 can be improved? It turns convergence as the forward-backward splitting algo- out that the answer is no, at least in infinite dimen- rithm (also called ISTA) [3], which requires knowl- sional problems: edge of the Lipschitz constant of f to ensure con- r vergence (or a suitable line search procedure). The Theorem 9. There is a Hilbert space and two H paper also includes general conditions for linear con- closed, proper, and convex functions f,g : H! vergence, linear convergence of feasibility problems ( , ] such that f is Lipschitz continuous and the 1 1 with nice intersection, and convergence rates un- following holds: for all ">0, there is an initial point 0 der strong convexity assumptions. Finally, the pa- z and a stepsize R++ with 2H 2 pers [10, 11] prove convergence rates for a wide class of algorithms that are significantly more powerful k 1 (f + g)(x ) (f + g)(x⇤)=⌦ g (k + 1)1/2+" than relaxed PRS and ADMM. ✓ ◆ j where (z )j 0 is generated by Algorithm (3) with = Conclusion. 1/2 and x is the unique minimizer of f + g. ⇤ The results of our paper show that the relaxed PRS The example that proves the sharpness of our method is nearly as slow as the o(1/pk + 1) rate is shown in Figure 4. The proof (Theorem 9). This is unexpected because split- of Theorem 9 is technical and involves estimating an ting methods tend to perform well in practice. We oscillatory integral, so we cannot get into the details can get around this issue by computing the ergodic in this brief article. sequence (Theorem 8) and testing whether it has a smaller objective value than the nonergodic se- Follow up work. quence. This procedure is essentially costless, but it tends to perform worse in practice (Figure 3). In the In a later paper [13], we show that relaxed PRS paper, we include several other complexity results and ADMM automatically adapt to the regularity including the first example of arbitrarily slow con- of Problems 1 and 2. For example, we show that for vergence for the Douglas-Rachford splitting method. all minimizers x⇤ of f + g, the “best” objective error We also analyze extensions of Problem (1) and get 24 INFORMS Optimization Society Newsletter similar results for the alternating direction method [9] Laurent Condat. A Primal-Dual Splitting Method for of multipliers (ADMM) [15]. Convex Optimization Involving Lipschitzian, Prox- imable and Linear Composite Terms. Journal of Op- timization Theory and Applications,158(2):460–479, Acknowledgements. 2013.

I would like to say thank you to the prize committee [10] Damek Davis. Convergence rate analysis of for this honor. I am also grateful to Professor Wotao primal-dual splitting schemes. arXiv preprint arXiv:1408.4419v2,2014. Yin for taking me on as a student and collaborating with me on this project. [11] Damek Davis. Convergence rate analysis of the forward- Douglas-Rachford splitting scheme. arXiv preprint arXiv:1410.2654v3,2014. REFERENCES [12] Damek Davis and Wotao Yin. Convergence rate anal- ysis of several splitting schemes. arXiv preprint [1] Heinz H. Bauschke and Patrick L. Combettes. Con- arXiv:1406.4834v2,2014. vex analysis and monotone operator theory in Hilbert spaces. Springer, 2011. [13] Damek Davis and Wotao Yin. Faster convergence rates of relaxed Peaceman-Rachford and ADMM [2] Heinz H. Bauschke, J.Y. Bello Cruz, Tran T.A. Nghia, under regularity assumptions. arXiv preprint Hung M. Phan, and Xianfu Wang. The rate of lin- arXiv:1407.5210v2,2014. ear convergence of the Douglas-Rachford algorithm for subspaces is the cosine of the Friedrichs angle. [14] Ernie Esser, Xiaoqun Zhang, and Tony Chan. A general Journal of Approximation Theory,185(0):63–79, framework for a class of first order primal-dual algo- 2014. rithms for convex optimization in imaging science. SIAM Journal on Imaging Sciences,3(4):1015–1046, [3] Amir Beck and Marc Teboulle. A Fast Iterative 2010. Shrinkage-Thresholding Algorithm for Linear In- verse Problems. SIAM Journal on Imaging Sciences, [15] Roland Glowinski and Am´erico Marrocco. Sur 2(1):183–202, 2009. l’approximation, par ´el´ements finis d’ordre un, et la r´esolution, par p´enalisation-dualit´ed’une classe de [4] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, probl`emes de Dirichlet nonlin´eaires. Rev. Francaise and Jonathan Eckstein. Distributed optimization dAut. Inf. Rech. Oper,R-2:41–76,1975. and statistical learning via the alternating direction method of multipliers. Foundations and Trends in [16] Thomas Goldstein and Stanley Osher. The split breg- Machine Learning,3(1):1–122,2011. man method for l1-regularized problems. SIAM Journal on Imaging Sciences,2(2):323–343,2009. [5] Luis M. Brice˜no-Arias and Patrick L. Combettes. A Monotone+Skew Splitting Model for Composite [17] Mark Aleksandrovich Krasnosel’skii. Two remarks on Monotone Inclusions in Duality. SIAM Journal on the method of successive approximations. Uspekhi Optimization,21(4):1230–1250,2011. Matematicheskikh Nauk,10(1):123–127,1955.

[6] Antonin Chambolle and Thomas Pock. A First-Order [18] Pierre-Louis Lions and Bertrand Mercier. Splitting Al- Primal-Dual Algorithm for Convex Problems with gorithms for the Sum of Two Nonlinear Operators. Applications to Imaging. Journal of Mathematical SIAM Journal on Numerical Analysis,16(6):964– Imaging and Vision,40(1):120–145,2011. 979, 1979.

[7] Patrick L. Combettes and Jean-Christophe Pesquet. [19] W. Robert Mann. Mean Value Methods in Iteration. Proximal splitting methods in signal processing. In Proceedings of the American Mathematical Society, Fixed-Point Algorithms for Inverse Problems in Sci- 4(3):pp. 506–510, 1953. ence and Engineering, Springer Optimization and Its Applications, pages 185–212. Springer New York, [20] Neal Parikh and Stephen Boyd. Proximal algorithms. 2011. Foundations and Trends in Optimization,1(3):127– 239, 2014. [8] Patrick L. Combettes and Jean-Christophe Pesquet. Primal-Dual Splitting Algorithm for Solving Inclu- [21] Gregory B. Passty. Ergodic convergence to a zero of sions with Mixtures of Composite, Lipschitzian, and the sum of monotone operators in Hilbert space. Parallel-Sum Type Monotone Operators. Set-Valued Journal of Mathematical Analysis and Applications, and Variational Analysis,20(2):307–330,2012. 72(2):383 – 390, 1979. Volume 5 Number 1 May 2015 25

[22] Benar F. Svaiter. On Weak Convergence of the Douglas- Rachford Method. SIAM Journal on Control and Optimization,49(1):280–287,2011. Nominations for Society [23] Paul Tseng. A Modified Forward-Backward Splitting Prizes Sought Method for Maximal Monotone Mappings. SIAM Journal on Control and Optimization,38(2):431– 446, 2000. The Society awards four prizes annually at the IN- [24] John von Neumann. Functional operators: The geome- FORMS annual meeting. We seek nominations (in- try of orthogonal spaces. Princeton University Press, cluding self-nominations) for each of them, due by 1951. (Reprint of mimeographed lecture notes first distributed in 1933.). July 15, 2015. Details for each of the prizes, includ- ing eligibility rules and past winners, can be found [25] B˘ang Cˆong V˜u. A splitting algorithm for dual mono- by following the links from http://www.informs. tone inclusions involving cocoercive operators. Ad- org/Community/Optimization-Society/Prizes. vances in Computational Mathematics,38(3):667– 681, 2013. Each of the four awards includes a cash amount of US$1,000 and a citation plaque. The award win- ners will be invited to give a presentation in a special session sponsored by the Optimization Society dur- ing the INFORMS annual meeting in Philadelphia, PA in November 2015 (the winners will be respon- sible for their own travel expenses to the meeting). Award winners are also asked to contribute an ar- ticle about their award-winning work to the annual Optimization Society newsletter. Nominations, applications, and inquiries for each of the prizes should be made via email to the corre- sponding prize committee chair. The Khachiyan Prize is awarded for outstand- ing lifetime contributions to the field of optimization by an individual or team. The topic of the con- tribution must belong to the field of optimization in its broadest sense. Recipients of the INFORMS John von Neumann Theory Prize or the MPS/SIAM Dantzig Prize in prior years are not eligible for the Khachiyan Prize. The prize committee for this year’s Khachiyan Prize is as follows: Ilan Adler (Chair) • [email protected] Michael Ball • Don Goldfarb • Werner R¨omisch • The Farkas Prize is awarded for outstanding con- tributions by a mid-career researcher to the field of optimization, over the course of their career. Such contributions could include papers (published or submitted and accepted), books, monographs, and software. The awardee will be within 25 years of their terminal degree as of January 1 of the year of 26 INFORMS Optimization Society Newsletter the award. The prize may be awarded at most once in their lifetime to any person. The prize committee for this year’s Farkas Prize is as follows: Nominations of Candidates Ariela Sofer (Chair) for Society Ocers Sought • [email protected] Jon Lee • Sanjay Mehtotra will complete his term as Most- Sanjay Mehrotra • Recent Past-Chair of the Society at the conclusion of Zelda Zabinski the 2015 INFORMS annual meeting. Suvrajeet Sen • is continuing as Chair through 2016. Jim Luedtke The Prize for Young Researchers is awarded will also complete his term as Secretary/Treasurer to one or more young researcher(s) for an outstand- at the conclusion of the INFORMS meeting. The ing paper in optimization. The paper must be pub- Society is indebted to Sanjay and Jim for their work. lished in, or submitted to and accepted by, a refereed We would also like to thank four Society Vice- professional journal within the four calendar years Chairs who will be completing their two-year terms preceding the year of the award. All authors must at the conclusion of the INFORMS meeting: Imre have been awarded their terminal degree within eight P´olik, Juan Pablo Vielma, John Mitchell, and calendar years preceding the year of award. The Vladimir Boginski. prize committee for this year’s Prize for Young Re- Finally, we thank the Newsletter Editor, Shabbir searchers is as follows: Ahmed, who is also stepping down this year. Nick Sahinidis (Chair) We are currently seeking nominations of candi- • [email protected] dates for the following positions: Dan Bienstock • Chair-Elect Sam Burer • • Secretary/Treasurer Andrew Schaefer • • Vice-Chair for Computational Optimization • The Student Paper Prize is awarded to one or and Software more student(s) for an outstanding paper in opti- Vice-Chair for Integer and Discrete Optimiza- • mization that is submitted to and received or pub- tion lished in a refereed professional journal within three Vice-Chair for Linear and Conic Optimization • calendar years preceding the year of the award. Ev- Vice-Chair for Network Optimization ery nominee/applicant must be a student on the first • Newsletter Editor of January of the year of the award. All coauthor(s) • not nominated for the award must send a letter indi- Self nominations for all of these positions are encour- cating that the majority of the nominated work was aged. performed by the nominee(s). The prize committee To ensure a smooth transition of the chairmanship for this year’s Student Paper Prize is as follows: of the Society, the Chair-Elect serves a one-year term Mohit Tawarmalani (Chair) before assuming a two-year position as Chair; thus • [email protected] this is a three-year commitment. As stated in the Fatma Kilin¸c-Karzan Society Bylaws, “The Chair shall be the chief admin- • Warren Powell istrative ocer of the OS and shall be responsible for • the development and execution of the Society’s pro- Uday Shanbhag • gram. He/she shall (a) call and organize meetings of the OS, (b) appoint ad hoc committees as required, (c) appoint chairs and members of standing com- mittees, (d) manage the a↵airs of the OS between meetings, and (e) preside at OS Council meetings and Society membership meetings.” Volume 5 Number 1 May 2015 27

The Secretary/Treasurer serves a two-year Optimization groups at Princeton and Rutgers term. According to Society Bylaws, “The Secre- will be hosting the 2016 IOS Conference, and tary/Treasurer shall conduct the correspondence of Warren Powell will serve as the General Chair. the OS, keep the minutes and records of the Society, Stay tuned for more information. maintain contact with INFORMS, receive reports of activities from those Society Committees that may be established, conduct the election of ocers and Members of Council for the OS, make arrangements for the regular meetings of the Council and the membership meetings of the OS. As treasurer, he/she shall also be responsible for disbursement of the Society funds as directed by the OS Council, prepare and distribute reports of the financial condi- tion of the OS, help prepare the annual budget of the Society for submission to INFORMS. It will be the responsibility of the outgoing Secretary/Treasurer to make arrangements for the orderly transfer of all the Society’s records to the person succeeding him/her.” The Secretary/Treasurer is allowed to serve at most two consecutive terms; as Jim Luedtke is now completing his second term, the OS must elect a new Secretary/Treasurer this cycle. Vice-Chairs also serve a two-year term. Accord- ing to Society Bylaws, “The main responsibility of the Vice Chairs will be to help INFORMS Local Or- ganizing committees identify cluster chairs and/or session chairs for the annual meetings. In general, the Vice Chairs shall serve as the point of contact with their sub-disciplines.” The Newsletter Editor is appointed by the Society Chair. The Newsletter Editor is responsible for edit- ing this newsletter, which is produced in the spring each year. Although this is not an elected position, nominations for this position are being sought at this time for consideration by the chair. Please send your nominations or self-nominations to Jim Luedtke ([email protected]), including contact information for the nominee, by June 1, 2015. Online elections will begin in mid- June, with new ocers taking up their duties at the conclusion of the 2015 INFORMS annual meeting.