On the Solution of Nonlinear Semidefinite Programs by Augmented Lagrangian Methods
Den Naturwissenschaftlichen Fakult¨aten der Friedrich-Alexander-Universit¨at Erlangen-N¨urnberg zur Erlangung des Doktorgrades
vorgelegt von
Michael Stingl
aus Waldsassen Als Dissertation genehmigt von den Naturwissenschaftlichen Fakult¨aten der Friedrich-Alexander-Universit¨at Erlangen-N¨urnberg
Tag der m¨undlichen Pr¨ufung:
Vorsitzender der Pr¨ufungskommission: Prof. Dr. D.-P.H¨ader
Erstberichterstatter: Prof.Dr. G.Leugering Friedrich-Alexander-Universit¨at Erlangen-N¨urnberg
Zweitberichterstatter: RNDr. M. Koˇcvara,DrSc. Academy of Sciences of the Czech Republic, Prague Prof. Dr. F. Jarre Heinrich-Heine-Universit¨at D¨usseldorf
ii Acknowledgements
This thesis has been growing over a period of several years. It is only natural that the thesis was influenced by a lot of people during that time. I would like to take the time to thank those who contributed directly or indirectly. In particular I want to thank my wife Andrea and my sons Jonas and Nikola, for their love and patience. • my parents, who gave me the opportunity to start and finish my diploma studies • in mathematics at the university of Erlangen. my former supervisor Prof. Dr. J. Zowe who gave me the opportunity to start my • PhD studies at the chair of Applied Mathematics II in Erlangen. Unfortunately he could not continue his work after a serious accident shortly after the beginning of my PhD studies. Prof. Dr. G. Leugering who offered me to finish my PhD studies as a member of • his chair and kindly took over the supervision of my thesis during the last year. Dr. M. Koˇcvara, who kindly took over the supervision of my PhD studies after • the accident of Prof. Dr. J. Zowe. He always had an open ear for my questions. In innumerably many discussions, he shared his knowledge with me. He was al- ways able to give me new inspirations and useful hints for further investigations. Dr. R. Sommer and Dr. R. Werner, – both former members of the chair of • Applied Mathematics II in Erlangen – who gave me a lot of support especially in the first phase of my PhD studies.
the colleagues at the Chair of Applied Mathematics II in Erlangen for the kind • and helpful atmosphere during the time of my PhD studies.
iii Abstract
This thesis presents a method for the solution of nonlinear semidefinite programming problems. The goal of nonlinear semidefinite programming is to solve optimization problems subjected to nonlinear matrix inequalities. The method described can be interpreted as a generalization of the modified barrier method for the solution of non- linear programs subjected to vector-valued inequality constraints. As such, the method belongs to the general category of augmented Lagrangian type methods. Along with a comprehensive convergence analysis, including local and global convergence aspects, emphasis is given to the efficient implementation of the method. A detailed algorithm is presented and numerical aspects of the computer code PENNON are described. The thesis concludes with reports on extensive numerical studies performed with the com- puter code PENNON . Moreover, two classes of nonlinear semidefinite programming problems arising from structural optimization and control theory are discussed in detail. The first chapter contains an overview about related work in the area of nonlinear semidefinite programming. Further, examples of applications of nonlinear semidefinite programming are given, and a motivation for the development of the method described in this thesis is presented. The second chapter introduces concepts from the area of matrix analysis. In partic- ular, so called primary matrix functions and (directional) derivatives of matrix functions are defined. In the third chapter of this thesis, the nonlinear semidefinite programmingproblem is specified in more detail. Several assumptions among them optimality conditions of first and second order and constraint qualifications are formulated. Furthermore the impact of these assumptions is discussed. The fourth chapter introduces a class of matrix penalty functions. Concrete ex- amples of this class are presented and matrix penalty functions, which are uniquely defined by associated real-valued penalty functions, are investigated. Using the class of penalty functions defined in the preceding chapter, a class of augmented Lagrangian functions is introduced in the fifth chapter. This class is the basis of the algorithm for the solution of nonlinear semidefinite programming problems presented later in the thesis. Various important properties of these functions are proven. At the beginning of the sixth chapter, a (local) algorithm for the solution of non- linear semidefinite programs is introduced. It is shown that the local algorithm is well defined. A special version of the implicit function theorem plays an importantrole here. Contractivity of the algorithm is established under the assumption that the penalty pa- rameter is kept constant. Finally, it is explained why in a certain neighborhood of the optimum, the solution of a non-convex semidefinite programming problem can be replaced by the solution of a sequence of unrestricted convex problems. In the seventh chapter, two globalization techniques for the algorithm introduced in the preceding chapter are proposed. For each of the approaches certain convergence properties are established. At the end of this chapter, an algorithm combining the local and global convergence properties is presented. The eighth chapter deals with the computational complexity of the algorithm. Complexity formulas are derived, which indicate serious problems of the (general) algorithm, when applied to large scale problems. A remedy is given by the choice
iv of a special penalty function. It is shown that using this special penalty function, the computational complexity can be reduced significantly and various sparsity types in the problem data can be exploited. In the ninth chapter of the thesis, several sub-aspects of the algorithm are dis- cussed. Among them are: the solution of the (generally non-convex) unrestricted subproblems, multiplier update formulas, penalty parameter update schemes, solution techniques for linear systems, initialization formulas and stopping criteria. In chapter ten, nonlinear semidefinite programming problems arising in structural optimization and control theory are presented. The eleventh chapter describes and discusses results of comprehensive numerical experiments. Apart from case studies in structural optimization and benchmarks with problems from control theory, comparisons of PENNON to alternative (linear) semidef- inite programming solvers are given. Finally, the twelfth chapter of this thesis offers an outlook on future research and possible improvements of the method.
v Zusammenfassung
In Rahmen dieser Arbeit wird eine Methode zur L¨osung nichtlinearer semidefiniter Programme eingef¨uhrt. Ziel der nichtlinearen semidefiniten Programmierung ist es, Optimierungsprobleme mit nichtlinearen Matrixungleichungen als Nebenbedingungen zu l¨osen. Die beschriebene Methode kann als Verallgemeinerung der modifizierten Barriere-Methode, die zur L¨osung nichtlinearer Programme mit reellen Ungleichheits- nebenbedingungenentwickelt wurde, betrachtet werden und l¨asst sich als solche in den allgemeinen Kontext der Augmented-Lagrange-Methodeneinordnen. Neben einer um- fassenden Konvergenzanalyse, in deren Rahmen sowohl lokale als auch globale Kon- vergenzaussagen erarbeitet werden, liegt ein weiterer Schwerpunkt der Arbeit in einer effizienten Umsetzung der Methode. Ein detaillierter Algorithmus wird vorgestellt, und numerische Aspekte des Computerprogrammes PENNON werden er¨ortert. Im let- zten Teil der Arbeit wird die Durchf¨uhrung umfangreicher numerischer Experimente mit dem Computerprogramm PENNON beschrieben und deren Ergebnisse diskutiert. Desweiteren werden nichtlineare semidefinite Programme aus dem Bereich der Struk- turoptimierung und der Steuerungstheorie genauer erl¨autert. Im ersten Kapitel wird ein Uberblick¨ ¨uber weitere Arbeiten im Bereich der nichtlin- earen semidefiniten Programmierung gegeben. Desweiteren werden wichtige Anwen- dungen der nichtlinearen semidefiniten Programmierung genannt. Schliesslich werden die Hintergr¨unde, die zur Idee der im Rahmen dieser Arbeit vorgestellten Methode f¨uhrten, erl¨autert. Im zweiten Abschnitt werden wichtige Begriffe aus dem Bereich der Matrixanaly- sis bereitgestellt. Insbesondere werden sogenannte prim¨are Matrixfunktionen definiert und (Richtungs-)ableitungen von Matrixfunktionen diskutiert. Im dritten Kapitel dieser Arbeit wird das Problem der nichtlinearen semidefiniten Programmierung genauer erl¨autert. Desweiteren werden verschiedene Voraussetzun- gen, darunter Optimalit¨atsbedingungen erster und zweiter Ordnung, sowie Constraint Qualifications gemacht und deren Bedeutung diskutiert. Im vierten Kapitel wird eine Klasse von Penaltyfunktionen f¨ur Matrixnebenbe- dingungen eingef¨uhrt, die anhand von Beispielen konkretisiert wird. Matrix-Penalty- funktionen, deren Definition auf Penaltyfunktionen basiert, die aus dem Bereich der nichtlinearen Programmierung (mit reellwertigen Nebenbedingungen) bekannt sind, werden auf Ihre Eignung untersucht. Mit Hilfe der im vorangegangenen Abschnitt eingef¨uhrten Penaltyfunktionen wird im funften¨ Kapitel eine Klasse von Augmented-Lagrange-Funktionen definiert. Die hier beschriebene Klasse von Augmented-Lagrange-Funktionen stellt die Grundlage des sp¨ater eingef¨uhrten Algorithmus zur L¨osung nichtlinearer semidefiniter Programme dar. Es werden verschiedene entscheidende Eigenschaften nachgewiesen. Im sechsten Kapitel wird ein (lokaler) Algorithmus zur L¨osung nichtlinearer semi- definiter Programme definiert. Hier liegt der theoretische Schwerpunkt der Arbeit. Zun¨achst werden Wohldefiniertheit des Algorithmus und Existenzaussagen untersucht. Hierbei spielt eine spezielle Version des Satzes ¨uber implizite Funktionen eine beson- dere Rolle. Anschliessend wird Kontraktivit¨at des Algorithmus bei fest gehaltenem Penaltyparameter bewiesen. Ferner wird erl¨autert, dass unter den gegeben Vorausset- zungen in einer lokalen Umgebung des Optimums, die L¨osung eines nichtkonvexen
vi semidefiniten Optimierungsproblems auf die L¨osung einer Folge unrestringierter kon- vexer Optimierungsprobleme zur¨uckgef¨uhrt werden kann. Das siebte Kapitel befasst sich mit der Globalisierung des im sechsten Kapitel vorgestellten lokalen Algorithmus. Es werden zwei Globalisierungsans¨atze vorgestellt und entsprechende globale Konvergenzaussagen nachgewiesen. Im achten Abschnitt wird die Rechenkomplexit¨at unseres Algorithmus untersucht. Die errechneten Komplexit¨atsformeln legen die Wahl einer speziellen Penaltyfunktion nahe, mit deren Hilfe die Rechenkomplexit¨at des Algorithmus erheblich reduziert wer- den kann. Desweiteren wird erl¨autert, wie spezielle Datenstrukturen (d¨unnbesetzte Matrizen) ausgenutzt werden k¨onnen. Im neunten Teil der Arbeit werden verschiedene Unteraspekte des Algorithmus genauer erl¨autert. Dazu geh¨oren: L¨osung von (unter Umst¨anden nicht-konvexen) un- restringierten Unterproblemen, Multiplikatorupdateformeln, Penaltyparameterupdate- strategien, die effiziente L¨osung linearer Gleichungssysteme, sowie die Initialisierung des Algorithmus und Abbruchkriterien. Im zehnten Kapitel werden nichtlineare semidefinite Programme aus den Bereichen der Strukturoptimierung und der Steuerungstheorie genauer erl¨autert. Im elften Abschnitt der Arbeit werden die Ergebnisse umfangreicher numerischer Experimente, darunter Fallstudien aus dem Bereich der Strukturoptimierung, Bench- marks mit Problemen aus dem Bereich der Steuerungstheorie und Vergleiche mit alter- nativer Software zur L¨osung (linearer) semidefiniter Programme berichtet und disku- tiert. Abschliessend werden im zwolften¨ Abschnitt Ausblicke auf m¨ogliche Weiterent- wicklungen der vorgestellten Methode gegeben und offene Punkte dargelegt.
vii Contents
1 Introduction 1
2 Functions of Matrices 4
3 Problem Formulation and Basic Assumptions 9 3.1 ProblemFormulation ...... 9 3.2 BasicAssumptions ...... 10
4 A Class of Penalty Functions 12 4.1 IdeaandDefinition ...... 12 4.2 Examples ...... 13
5 A Class of Augmented Lagrangians 17
6 A Locally Convergent Augm. Lagr. Algorithm 25 6.1 BasicAlgorithm...... 25 6.2 LocalConvergenceProperties ...... 26
7 Globally Convergent Algorithms 48 7.1 TheShiftedBarrierApproach ...... 48 7.2 TheGeneralApproach ...... 57
8 Complexity Issues and Consequences 62 8.1 ComplexityAnalysis ...... 62 8.2 ASpecialPenaltyFunction...... 64 8.3 VariousWaysofExploitingSparsity ...... 66
9 Algorithmic Details as Implemented in Pennon 69 9.1 TheUnconstrainedMinimizationProblem ...... 69 9.1.1 GlobalizedNewton’sMethod ...... 70 9.1.2 TheTrustRegionMethod ...... 71 9.1.3 Globalized Newton’s Method versus Trust Region Method.. 71 9.1.4 HowtoSolvetheLinearSystems? ...... 71 9.2 UpdateStrategies ...... 72 9.2.1 TheMultiplierUpdate ...... 72
viii 9.2.2 ThePenaltyParameterUpdate ...... 74 9.3 InitializationandStoppingCriteria ...... 75 9.3.1 Initialization ...... 75 9.3.2 StoppingCriteria ...... 76 9.4 ThePENNON Algorithm ...... 77
10 Applications 80 10.1 StructuralOptimization ...... 80 10.1.1 MaterialOptimization ...... 80 10.1.2 TrussTopologyDesign...... 83 10.1.3 Structural Optimization with Vibration and Stability Constraints ...... 85 10.2 NonlinearSDPsarisinginControlTheory ...... 89 10.2.1 TheStaticOutputFeedbackProblem ...... 89 10.2.2 SimultaneousStabilizationBMIs ...... 91
11 Benchmarks and Numerical Experiments 94 11.1 Numerical Experiments in Structural Optimization ...... 94 11.1.1 GlobalStabilityandSelfVibration...... 94 11.1.2 MaximizingtheMinimalEigenfrequency ...... 105 11.2 Numerical Experiments with COMPleib -SOFProblems...... 107 11.3 Numerical Experiments with Simultaneous StabilizationBMIs . . . . 110 11.4 A Benchmarkwith a Collection of Random BMI Problems ...... 112 11.5 ABenchmarkwithLinearSDPSolvers ...... 114 11.5.1 SDPLIB...... 115 11.5.2 HMCollection ...... 118
12 Outlook 132
ix List of Figures
10.1 Cargoairplane;discretizationand density plot ...... 82 10.2 FMOdesignofabicycleframe...... 84
11.1 Scenario Tower;groundstructure ...... 95 11.2 Scenario Tower;single-loadresult ...... 96 11.3 Scenario Tower;vibrationconstraintadded ...... 96 11.4 Scenario Tower;stabilityconstraintadded ...... 96 11.5 Scenario Cantilever;groundstructure ...... 97 11.6 Scenario Cantilever;single-loadresult ...... 97 11.7 Scenario Cantilever;vibrationconstraintadded ...... 98 11.8 Scenario Cantilever;stabilityconstraintadded ...... 98 11.9 Scenario Bridge;groundstructure ...... 99 11.10 Scenario Bridge;single-loadresult ...... 100 11.11 Scenario Bridge;vibrationconstraintadded ...... 100 11.12 Scenario Bridge;stabilityconstraintadded ...... 101 11.13 Scenario Plate –Geometryandsingle-loadresult ...... 101 11.14 Scenario Plate – vibration result (left) and stability result (right) . . . 102 11.15 Scenario Plate-Narrow –Geometryandsingle-loadresult ...... 103 11.16 Scenario Plate-Narrow – vibration result (top) and stab. result (bottom) 103 11.17 Scenario Two-Forces –Geometryandsingle-loadresult ...... 104 11.18 Scenario Two-Forces – vibration result (left) and stability result (right) 105 11.19 Maximizing Vibration Mode– Geometry and Result ...... 105
x List of Tables
11.1 Scenario Tower;problemdimensions ...... 95 11.2 Scenario Tower;numericalresults...... 97 11.3 Scenario Cantilever;problemdimensions...... 98 11.4 Scenario Cantilever;numericalresults ...... 99 11.5 Scenario Bridge;problemdimensions...... 99 11.6 Scenario Bridge;numericalresults ...... 100 11.7 Scenario Plate;problemdimensions...... 101 11.8 Scenario Plate;numericalresults ...... 102 11.9 Scenario Plate-Narrow;problemdimensions ...... 102 11.10 Scenario Plate-Narrow;numericalresults ...... 103 11.11 Scenario Two-Forces;problemdimensions ...... 104 11.12 Scenario Two-Forces;numericalresults...... 104 11.13 MaximizingVibration Mode; numericalresults ...... 105 11.14 Results of PENBMI on -BMIproblems...... 108 H2 11.15 Results of PENBMI on ∞-BMIproblems ...... 109 11.16 Collectionofsimultaneousstab.H problems ...... 111 11.17 Results of PENBMI onsimultaneousstab.problems ...... 111 11.18 Random Problems – Category II; numericalresults ...... 113 11.19 Random Problems – Category II; numericalresults ...... 114 11.20 SelectedSDPLIBproblems ...... 115 11.21 SDPLIB(subset)–computationalresults ...... 116 11.22 SDPLIB (subset) – summed/averaged computational results . . . . . 118 11.23 HMcollection ...... 119 11.24 STRUCTOPTset–computationalresults ...... 121 11.25 STRUCTOPT set – summed/averaged computational results..... 123 11.26 GRAPHset–computationalresults...... 123 11.27 GRAPH set – summed/averaged computational results ...... 124 11.28 SOSTOOLS-GLOPTYPOLY set – computationalresults ...... 124 11.29 SOSTOOLS-GLOPTYPOLY set – summed/averaged comp. results . 126 11.30 IMAGEset–computationalresults ...... 127 11.31 IMAGE set – summed/averaged computational results ...... 127 11.32 CHEMICALset–computationalresults ...... 128 11.33 CHEMICAL set – summed/averaged computational results ..... 129 11.34 MIXEDset–computationalresults ...... 129 11.35 MIXED set – summed/averaged computational results ...... 130
xi 11.36 HM collection – summed/averaged computational results ...... 131
xii Notation
N the natural numbers R the real numbers Rn real n-dimensional vectors over R Mm,n m n-matrices over R Sm symmetric× m m-matrices over R Sm × R + positive semidefinite m m-matrices over Sm × R ++ positive definite m m-matrices over (a,b) open interval in R × Hm(a,b) symmetric m m-matrices over R with spectrum in (a,b) (x∗) cone of critical× directions C Om zero matrix of order m Im unit matrix of order m n [ai]i=1 column vector with entries ai in i-th component [a ]m m m-matrix with entries a in i-th row and j-th column i,j i,j=1 × i,j Ai,j matrix entry in i-th row and j-th column
v Euclidian norm of a real vector kAk Frobenius norm of a matrix kv,k w standard inner product in Rn trh i trace operator A, B standard inner product on Sm h i f real valued objective function matrix valued constraint function ϕA real valued penalty function Φ matrix penalty function Ω set of feasible points Ωp relaxed set of feasible points sk(A) k-th eigenvector of matrix A λk(A) k-th eigenvalue of matrix A E0, E⊥, P0, P⊥ projection matrices Pk(A) Frobenius covariance matrices
xiii ′ Fx(x,U,p) partial derivative of F w. r. t. x ′′ Fxx(x,U,p) second order partial derivative of F w. r. t. x ′ i partial derivative of w. r. t. xi A′′ A ij second order partial derivative of w. r. t. xi and xj A A ∗ i partial derivative of w. r. t. xi, evaluated at x A A ∗ ij second order partial derivative of w. r. t. xi and xj , evaluated at x ADΦ(A) derivative of Φ w. r. t. A A DΦ(A)[B] directional derivative of Φ w. r. t. A in direction B D2Φ(A)[B, C] second order directional derivative of Φ w. r. t. A in directions B and C ∆ϕ divided difference formula of first order ∆2ϕ divided difference formula of second order
xiv Chapter 1
Introduction
In the recent years (linear) semidefinite programming problems have received more and more attention. One of the main reasons is the large variety of applications leading to semidefinite programs (see [5], [40], for example). As a consequence, various algo- rithms for solving linear semidefinite programs have been developed, as for instance interior point or dual scaling methods (see, for example, [40],[79], [82] and the refer- ences therein). However, mathematical models in several applications lead to problems, which can not be formulated as linear, but more generally as nonlinear semidefinite programming problems. For example, such an observation can be made in control the- ory: There are many relevant control problems, which boil down to linear semidefinite programming problems (see [20] for a long list), however, there are still fundamental problems for which no linear semidefinite formulation has been found yet. This was the reason why so called BMI formulations (problemswith bilinear matrix inequalities) of control problems became popular in the mid 1990s [39]. As there were, however, no computational methods for solving (non-convex) BMIs, or more generally speak- ing nonlinear semidefinite programs, several groups of researchers started to develop algorithms and software for the solution of BMI problems. For example, interior-point constrained trust region methods were proposed in [57] for a special class of BMIs. Further approaches were presented in [33] and [34]. In [34], sequential semidefinite programming, as an extension of quadratic programming, was used to solve LMI prob- lems with additional nonlinear matrix equality constraints, while in [33] the augmented Lagrangian method was applied to a similar class of problems. Later, more applications, as, for example, robust optimization (see [5]) or struc- tural optimization (see, e.g., [4]), appeared, where nonlinear semidefinite program- ming played an important role. Consequently the interest in general purpose nonlinear semidefinite programming methods and software grew. Nowadays, there are several approaches for the solution of nonlinear semidefinite programs: For example, in [28], the method proposed in [34] is generalized to general purpose nonlinear semidefinite programming problems and a proof of global convergence is given. Another promis- ing approach for the solution of general purpose nonlinear semidefinite programming problems is presented in [46]. The method is an extension of a primal predictor cor- rector interior method to non-convex programs, where the corrector steps are based
1 CHAPTER 1. INTRODUCTION 2 on quadratic subprograms that combine aspects of line search and trust region meth- ods. Just recently a smoothing type algorithm for the solution of nonlinear semidefinite programming problems was introduced in [47] and further explored in [48]. Our main motivation for the development of an own general purpose semidefinite programming solver was given by nonlinear semidefinite programming problems aris- ing in structural optimization. The difficulty with these problems is that depending on the type of constraints which are taken into consideration (as, for example, compliance constraints, strain constraints, constraints on global stability, etc.), one obtains large scale semidefinite programs with entirely different sparsity structures. On the other hand there exist examples, where the main difficulty is not in the structure or size, but in the nonlinearity of the problem. Consequently we came to the conclusion that there are at least two basic requirements to be satisfied by a solver, which should be applica- ble to a wide class of (nonlinear) semidefinite optimization problems: First, the solver should be able to cope with serious non-linearities and, second, the solver should be able to exploit various types of sparsity in the problem data. At the time when we started with the development of our algorithm, there was – to our best knowledge – no such solver available. As a consequence of the special requirements, it seemed to be a natural idea to search for a concept, which combined in a certain way the features of highly developed (standard) nonlinear programming algorithms with the abilities of (large-scale) linear semidefinite programming solvers. Our first and probably the most obvious idea was to generalize interior point algorithms which were successfully applied to both, nonlinear programming problems and linear semidefinite programming problems. However, we encountered some difficulties in the generalization of the interior point idea, as, for ex- ample, the combination of the symmetrization issue with the requirement of preserving sparsity in the problem data. Consequently we drew our attention to another promis- ing method, the so called Penalty-Barrier-Multiplier method (PBM), which had been recently invented by Ben-Tal and Zibulevsky for the solution of convex programming problems (see [7]) and later adapted by Zibulevsky and Mosheyev for the application to linear semidefinite programming problems (see [63] and [87]). Nevertheless, also the Zibulevsky/Mosheyev approach turned out to have some serious drawbacks. First of all, the convergence theory was done solely for convex programming problems and, second, the semidefinite version of the PBM method was based on eigenvalue decom- positions of the constraint matrices, which are known to be critical numerical opera- tions not only in terms of computational complexity, but also in terms of robustness. The first issue was not that crucial, because there were already successful attempts to generalize the PBM idea to non-convex problems (see, for example, [21], where a global convergence result is given) and there was a rich theory for the so called mod- ified barrier function method (MBF), invented by R. Polyak (see, for example, [68], [69] and [70]), which is in certain respect the basis of the PBM methods. A remedy for the second issue was discovered, when we investigated the concepts of PBM and MBF methods: First we made the observation that a generalized PBM function applied to a matrix constraint could result in a non-convex function, even in the case of con- vex problem data. Second, we found a way how to avoid eigenvalue decompositions, when using a (generalized version of a) special modified barrier function instead. A nice consequence of this observation was that with the special choice of the modified CHAPTER 1. INTRODUCTION 3 barrier function, we saw a clear way, how to preserve sparsity patterns in the prob- lem data. Consequently we decided to base our algorithm on the concept of the MBF method. Nevertheless, we did not want to restrict our theoretical considerations (con- vergence proofs, etc.) to an algorithm, which is based on one specific penalty function. Therefore we decided to develop an augmented Lagrangian algorithm based on the MBF method combined with • a wide class of penalty functions, a comprehensive local and global convergence theory, and • a specialized code which allows for the solution of large scale nonlinear semidef- • inite programs by extensive utilization of problem inherent sparsity structures. Chapter 2
Functions of Matrices
In the course of this section we introduce a class of matrix functions defined on the space of symmetric m m-matrices and henceforth denoted by Sm. The definition of these functions is based× on a spectral decomposition of a symmetric matrix and a real-valued function. Definition 2.1 (Primary matrix function) Let ϕ : R R be a given function and A Sm be a given symmetric matrix. Let further→A = SΛS⊤, where Λ = ∈ ⊤ diag (λ1,...,λm) , be an eigenvalue decomposition of A. Then the primary matrix function Φ corresponding to ϕ is defined by
Φ : Sm Sm → ϕ (λ1) 0 . . . 0 . 0 ϕ (λ2) . A S S⊤ . 7−→ . . . .. 0 0 . . . 0 ϕ (λ ) m Definition 2.1 is a version of a more general definition for hermitian matrices presented in [45]. Example 2.1 Consider the real-valued function R R ϕ : . x → x2 7→ Then the primary matrix function Φ : Sm Sm is given by → 2 λ1 0 . . . 0 . 0 λ2 . Φ(A)= S 2 S⊤ = SΛ2S⊤ = A2, . . . .. 0 0 . . . 0 λ2 m where SΛS⊤ is an eigenvalue decomposition of A.
4 CHAPTER 2. FUNCTIONS OF MATRICES 5
By the following definition we introduce a notation for the set of symmetric matrices with spectrum in a certain interval. Definition 2.2 Let (a,b) R and m N be given. Then by Hm(a,b) we define the set of all symmetric matrices⊂ A S∈m with λ (A) > a and λ (A) < b, where ∈ 1 m λ1(A), λ2(A),...,λm(A) are the increasingly ordered eigenvalues of A. Next we want to discuss monotonicity and convexity of primary matrix functions. Be- fore we are able to do this, we need to introduce a partial order on the space of sym- metric matrices. We start with the following definition: Definition 2.3 A matrix A Sm is said to be positive semidefinite, if all eigenvalues of A are nonnegative. ∈ In analogy to definition 2.3 a matrix A Sm is said to be negative semidefinite, if all eigenvalues of A are non-positive. It∈ is well known that the set of all positive Sm semidefinite m m-matrices, denoted by + is a pointed convex cone. This cone induces a partial× order on Sm, the so called Loewner (partial) order: For two given matrices A, B Sm, we define ∈ A B : B A is positive semidefinite ⇔ − and A B : A B is positive semidefinite. ⇔ − In accordance with this definition we use the notation A 0 to indicate that the matrix A is positive semidefinite and A 0 for negative semidefinite matrices. Now we are prepared for the following definition: Definition 2.4 A given matrix function Φ : Sm Sm is said to be monotone, if for any two matrices A, B Sm with A B the inequality→ Φ(A) Φ(B) is satisfied. Furthermore Φ is said to∈ be convex if the inequality
Φ(λA + (1 λ)B) λΦ(A)+(1 λ)Φ(B) − − is valid or all A, B Sm and any 0 λ 1. ∈ ≤ ≤ The following example demonstrates that it is generally wrong to conclude from the monotonicity/convexity of ϕ to the monotonicity/convexity of the corresponding pri- mary matrix function: Example 2.2 Let ϕ be given as in example 2.1. Obviously ϕ is monotone on the interval [0, ). Nevertheless, the same property does not hold for the corresponding primary matrix∞ function. To see this, consider the following counter example: 20 2 2 1 18 1 Let A = and B = . Then the matrix A B = 2 1 1 0.5 − 1 0.5 has two strictly positive eigenvalues. On the other hand the matrix 399 39.5 A2 B2 = has one positive and one negative eigenvalue. − 39.5 3.75 CHAPTER 2. FUNCTIONS OF MATRICES 6
A useful criterion for the determination of monotonicity and convexity of a primary matrix function is based on the following definition and formulated in Lemma 2.1:
Definition 2.5 Let (a,b) be a given real interval, let t1,t2,...,tm be m real values and let ϕ : R R be a twice continuously differentiable function. Then we define → ϕ(t ) ϕ(t ) i − j , for i = j , ∆ϕ(ti,tj ) = ti tj 6 ′ − ( ϕ (ti) , for i = j , ∆ϕ(t ,t ) ∆ϕ(t ,t ) i k − j k , for i = j , ti tj 6 2 − ∆ ϕ(t ,t ,t ) = ∆ϕ(ti,tj ) ∆ϕ(tk,tj ) i j k − , for i = j = k , ti tk ′′ 6 ϕ (ti) − , for i = j = k. Lemma 2.1 Let (a,b) be a given real interval. Let further ϕ : (a,b) R be a twice continuously differentiable function. Then: →
a) The primary matrix function Φ corresponding to ϕ is monotone on Hm(a,b) , if and only if the matrix m is positive semidefinite for each set of ∆ϕ(ti,tj ) i,j=1 real values t , ...,t (a,b). 1 m ∈ b) The primary matrix function Φ corresponding to ϕ is convex on Hm(a,b) , if and only if the matrix 2 m is positive semidefinite for each set ∆ ϕ(ti,tj ,t1) i,j=1 of real values t , ...,t (a,b). 1 m ∈ Proof . See, for example,[45, p. 537ff].
Lemma 2.1 allows us to conclude from certain properties of ϕ to the monotonicity and convexity of the corresponding primary matrix function. This motivates the following definition:
Definition 2.6 A real-valued function ϕ defined on an interval (a,b) R is ⊆ m a) operator monotone on (a,b), if ∆ϕ(t ,t ) 0 for all t , ...,t (a,b) i j i,j=1 1 m ∈ and all m N. ∈ m b) operator convex on (a,b), if ∆2ϕ(t ,t ,t ) 0 for all t , ...,t i j 1 i,j=1 1 m ∈ (a,b) and all m N. ∈ In the remainder of this section we want to discuss derivative formula for a mapping composed from a twice continuously differentiable mapping : Rn Sm and a pri- mary matrix function on Sm. A useful tool in this context is providedA → by the following definition:
Definition 2.7 (Frobenius covariance matrices) Let : D Rn Sm be a given A ⊂ → mapping. Let λ1(x), λ2(x),...,λµ(x)(x) denote the µ(x) increasingly ordered distinct eigenvalues of (x). Let further (x) = S(x)Λ(x)S(x)⊤ be an eigenvalue decom- position of (xA), with Λ(x) = diag(A λ (x),...,λ (x),...,λ ,...,λ ), where A 1 1 µ(x) µ(x) CHAPTER 2. FUNCTIONS OF MATRICES 7 each eigenvalue occurs in its given multiplicity at x. Then the Frobenius covariance matrices are defined by
⊤ PA,i(x)= S(x)diag(0,..., 0, ..., 1,..., 1, ..., 0,..., 0)S(x) , i =1,...,µ(x), where the non-zeros in the diagonal matrix occur exactly in the positions of λi(x) in Λ. Some important properties of Frobenius covariance matrices are summarized in the following Lemma: Lemma 2.2 Let (x) Sm be given.Then the following equations hold: A ∈ 1. PA,1(x)+ . . . + PA,µ(x)(x)= Im; 2. P (x)P (x)=0 for all i = j, i, j =1,...,µ(x); A,i A,j 6 3. P (x)k = P (x) for all k 1, i =1,...,µ(x). A,i A,i ≥ Proof . See, for example,[45, p. 403].
For simplicity of notation we omit the subscript and henceforth use the abbreviation P (x)= P (x) for all i =1,...,µ(x) and allAx Rn. k A,k ∈ In the scope of the following Theorem we want to present formulas for partial deriva- tives of the mapping Φ : D Sm. In order to keep notation as brief as possible, we introduce the abbreviations◦ A → 2 ′ ∂ ′′ ∂ i(x)= (x) and i,j (x)= (x) A ∂xi A A ∂xi∂xj A for the first and second order partial derivatives of the mapping . A Theorem 2.3 Let (a,b) R, m N and : D Rn Sm be a twice continuously ⊆ ∈ A ⊂ → differentiable mapping. Denote by λ1(x), λ2(x),...,λµ(x)(x) the µ(x) increasingly ordered distinct eigenvalues of (x) and let λ1(x) a and λµ(x)(x) b for all x D. Let further ϕ : (a,b) AR be a twice continuously≥ differentiable function≤ and Φ ∈be the corresponding primary→ matrix function. Then Φ is twice differentiable for all x D and the following formulas hold: ◦ A ∈ µ(x) ∂ ′ Φ( (x)) = ∆ϕ(λk(x), λl(x))Pk(x) i(x)Pl(x) ∂xi A A k,lX=1 = S(x) [∆ϕ(λ (x), λ (x))]m S(x)⊤ ′ (x)S(x) S(x)⊤ k l k,l=1 • Ai 2 µ(x) ∂ 2 ⊤ Φ( (x)) = ∆ ϕ(λk(x), λl(x), λs(x)) (M + M ) + ∂xi∂xj A · k,l,sX=1 µ(x) ∆ϕ(λ (x), λ (x))P (x) ′′ (x)P (x), k l k Ai,j l k,lX=1 CHAPTER 2. FUNCTIONS OF MATRICES 8 where the matrix M is given by
M = P (x) ′ (x)P (x) ′ (x)P (x) k Ai l Aj s m and ’ ’ denotes the Hadamard product, defined by A B = [Ai,j Bi,j ]i,j=1 for any pair of• matrices A, B Sm. • ∈ Proof . Theorem 2.3 is a direct consequence of Theorem 6.6.30 in [45].
Due to the construction of Φ as a composition of two mappings, the first formula in Theorem 2.3 can be interpreted◦ A as a directional derivative of Φ at (x) in direction ′ (x). The following Corollary generalizes this observation: A Ai Corollary 2.4 Let the assumptions of Theorem 2.3 hold. Let further two matrices m B, C S be given. Then the directional derivatives of Φp ( (x)) with respect to (x∈) in direction B is given by: A A ∂ DΦ ( (x)) [B] = Φ(A(x))[B] p A ∂A(x) µ(x)
= ∆ϕ(λk(x), λl(x))Pk(x)BPl(x) k,lX=1 = S(x) [∆ϕ(λ (x), λ (x))]m S(x)⊤BS(x) S(x)⊤. k l k,l=1 •
Furthermore the second order directional derivative of Φp ( (x)) with respect to (x) in directions B and C is given by: A A
∂2 D2Φ ( (x)) [B; C] = Φ( (x))[B; C] p A ∂2A(x) A = D (DΦ ( (x)) [B]) [C] p A µ(x) = 2 ∆2ϕ(λ (x), λ (x), λ (x)) (N + N ⊤), k l s · k,l,sX=1 where the matrix N is given by N = Pk(x)BPl(x)CPs(x). Chapter 3
Problem Formulation and Basic Assumptions
Throughout this section we briefly describe the class of semidefinite programming problems, we want to solve by our algorithm. Furthermore we formulate basic as- sumptions involving constraint qualifications and optimality conditions for this class of semidefinite programming problems. For a comprehensive discussion on optimality conditions and constraint qualifications for semidefinite programming problems we re- fer the reader, for example, to [17], [72] or [40].
3.1 Problem Formulation
We consider the finite dimensional Hilbert space Sm introduced in Chapter 2, equipped with the inner product
A, B = trA⊤B = trAB for all A, B Sm, h i ∈ where tr denotes the trace operator. As we have already seen in the first chapter of this Sm Sm thesis, + induces a partial order “ ” respectively “ ” on . Using this notation, the basic semidefinite programming problem can be written as
min f(x) (SDP) x∈Rn s.t. (x) 4 0 . A Here f : Rn R and : Rn Sm are twice continuously differentiable mappings. In the case when→ f andA are convex,→ we refer to the basic problem as (CSDP). A Remark . Problem (SDP) belongs to a wider class of optimization problems called conic programs. Other important representatives of this class are (standard) nonlinear
9 CHAPTER 3. PROBLEM FORMULATION AND BASIC ASSUMPTIONS 10 programming problems of the type
min f(x) (NLP) x∈Rn s.t. g(x) 0 , ≤ where g maps from Rn to Rm or problems involving so called second order cone con- straints. An interesting fact is that the problem (NLP) can be considered as a sub-case of the problem (SDP), namely, when is a diagonal matrix. Consequently, the results in this thesis can be directly applied toA inequality constrained nonlinear programming problems. Vice versa it should be emphasized that many ideas presented in this thesis are motivated by existing results from the nonlinear programming literature, as, for ex- ample, [7], [68], [69], [70], [26], [25] and [21]. It should be further mentioned that it is beyond the scope of this thesis to generalize the algorithms and theory presented for semidefinite programs to general conic programs.
3.2 Basic Assumptions
Throughout this thesis the following assumptions on (SDP) are made: (A1) x∗ = argmin f(x) x Ω exists, where Ω= x Rn (x) 4 0 . { | ∈ } { ∈ |A } (A2) The Karush-Kuhn-Tuckernecessary optimality conditions hold in x∗. Thatmeans there exists U ∗ Sm such that ∈ f ′(x∗) + [ U ∗, ]m = 0 h Aii i=1 tr(U ∗ (x∗)) = 0 A U ∗ 0 (x∗) 0, (3.1) A ∗ where we denoteby i the i th partial derivative of in x (i =1,...,n). The second condition isA called complementary− slacknessA condition. Later we will prove that the complementary slackness condition can be given in the equivalent form λ (U ∗)λ ( (x∗))=0 for all i =1,...,m, i i A ∗ ∗ where λi(U ) and λi( (x )), i = 1,...,m denote the increasingly ordered eigenvalues of U ∗ andA (x∗), respectively. If in each of the above equations one factor is non-zero theA complementary slackness condition is said to be strict. Throughout this thesis we assume the strict complementary slackness condition to hold. (A3) The nondegeneracy condition holds. This means that if for 1 r The nondegeneracy condition is a well known constraint qualification for the problem (SDP) and implies that the matrix U ∗ is unique (see, for example, [17]). (A4) Define E0 = (sm−r+1,...,sm), where sm−r+1,...,sm are the vectors intro- duced in assumption (A3). Then the cone of critical directions at x∗ is defined as n (x∗)= h Rn : h E⊤ E 0,f ′(x∗)⊤h =0 . C ∈ i 0 Ai 0 ( i=1 ) X Now we assume the following second order sufficient optimality condition to hold at (x∗,U ∗): For all h (x∗) with h =0 the inequality ∈C 6 ⊤ ′′ ∗ ∗ ∗ ∗ h (Lxx(x ,U )+ H(x ,U )) h> 0, is satisfied, where L is the classical Lagrangian for (SDP) defined as L(x, U)= f(x)+ U, (x) , h A i H(x∗,U ∗) is defined by H(x∗,U ∗) = 2 U ∗, [ (x∗)]† (3.2) i,j − Ai A Aj (see, for example, [17, p. 490]) and [ (x∗)]† is the Moore-Penrose inverse of (x∗). A A (A5) Define Ω = x Rn (x) bpI , p { ∈ |A m} where b is a positive constant and p is a positive real value, which will play the role of a penaltyparameterlater. Then we assume the following growth condition to hold: π > 0 and τ > 0 such that max (x) x Ω τ, (3.3) ∃ {kA k | ∈ π}≤ It is clear that the existence of such a π implies the validity of (3.3) for any 0 π¯ < π. ≤ Later we will make use of the following lemma. Lemma 3.1 In case the strict complementarity condition holds the cone of critical directions simplifies to n C(x∗)= h Rn : h E⊤ E =0 . ∈ i 0 Ai 0 ( i=1 ) X Proof . See, for example,[17]. Chapter 4 A Class of Penalty Functions 4.1 Idea and Definition Using the concept of primary matrix functions introduced in Chapter 2 and recalling that the negative semidefiniteness of a matrix is equivalent to the non-positivity of its eigenvalues, the following idea is easy to understand: Let ϕ be a real-valued func- tion defined on an interval (a,b) R, which penalizes positive arguments. Then the corresponding primary matrix function⊆ penalizes any matrix A violating the constraint A 0, since ϕ penalizes each positive eigenvalue of A. This gives rise to the following definition: Definition 4.1 Let ϕ : ( ,b) R be a function with the following properties: −∞ → (ϕ0) ϕ strictly convex, strictly monotone increasing and twice continuously differentiable , (ϕ ) domϕ = ( ,b) with 0 0, 5 ∃ ϕ ,σ ≤ ϕ ,σ ′′ 2 (ϕ ) C ′′ such that ϕ (σ/p) p C ′′ for any σ < 0 and p> 0, 6 ∃ ϕ ,σ ≤ ϕ ,σ ′ ′ ′ (ϕ7) lim ϕ (t)= , lim ϕ (t)=0 , ϕ convex, t→b ∞ t→−∞ (ϕ ) ϕ is operator monotone on ( ,b), 8 −∞ (ϕ ) ϕ is operator convex on ( ,b). 8 −∞ Define further ϕ (t)= pϕ(t/p) for all t dom ϕ = ( ,bp), where p is a positive p ∈ p −∞ 12 CHAPTER 4. A CLASS OF PENALTY FUNCTIONS 13 penalty parameter. Then a matrix penalty function Φ : Sm Sm is defined as p → ϕp (λ1(x)) 0 . . . 0 . 0 ϕ (λ (x)) . Φ : (x) S(x) p 2 S(x)⊤ , p A 7−→ . . . .. 0 0 . . . 0 ϕ (λ (x)) p m where (x),S(x), λ (x), i =1,...,m, are defined as in Definition 2.1. A i Remark . Using the matrix penalty function Φp we are able to rewrite our initial problem (SDP) as min f(x) (SDPΦp ) x∈Rn s.t. Φ ( (x)) 4 0 . p A If we write down the Lagrangian for problem (SDPΦp ) we obtain the function L (x, U)= f(x)+ U, Φ ( (x)) , Φp h p A i which will be further considered in Section 5. Remark . Let us briefly discuss the role of assumptions (ϕ8) and (ϕ9): As we will see later (compare Section 6) assumptions (ϕ8) and (ϕ9) are not used in the proofs of the main theoretical results presented in this thesis. Nevertheless they are important for the following reason: Φ is a convex and monotone matrix function for all m N p ∈ if and only if assumptions (ϕ8) and (ϕ9) hold. Using this fact it is easy to see that for convex f and assumptions (ϕ ) and (ϕ ) guarantee that problem (SDP ) is A 8 9 Φp convex. In Section 5 we will see that in this case also the function LΦp is convex. On the contrary, if assumptions (ϕ8) and (ϕ9) do not hold, problem (SDPΦp ) may be non-convex even a for linear mapping . As a consequence it may happen that A the augmented Lagrangian LΦp is also non-convex for certain U 0. After reading Section 6 we will understand that this would mean that we have ≻to solve a series of potentially non-convex optimization problems in order to solve a problem, which is originally convex. This effect is avoided by the inclusion of assumptions (ϕ8) and (ϕ9) in Definition 4.1. Remark . Replacing (ϕ4) by the weaker assumption ′ σ (ϕ4) lim pϕ( ) 0 for any σ< 0 p→0 p → enables us to use a wider class of penalty functions, as we will demonstrate below. Throughout this thesis we assume (ϕ4) to hold and explain, how our results have to be ′ modified, if the weaker condition (ϕ4) is used instead. 4.2 Examples The goal of this section is to present a collection of typical penalty– and barrier–type functions from the nonlinear programming literature (see, for example, [68], [69], [7] CHAPTER 4. A CLASS OF PENALTY FUNCTIONS 14 and [80]) and discuss their suitability with respect to the assumptions required in defi- nition 4.1: The logarithmic penalty function is defined as • ϕ (t)= log(1 t). log − − Looking at the derivative formulas 1 1 ϕ′ (t)= and ϕ′′ (t)= log 1 t log (1 t)2 − − ′ it is obvious to see that (ϕ0) to (ϕ3), (ϕ4) and (ϕ5) to (ϕ7) are satisfied. The validity of (ϕ8) and (ϕ9) is proved, for example, in [75]. The hyperbolic penalty function is defined as • 1 ϕ (t)= 1. hyp 1 t − − Here the derivatives are given by 1 1 ϕ′ (t)= and ϕ′′ (t)=2 hyp (1 t)2 hyp (1 t)3 − − and (ϕ0) to (ϕ7) are verified easily. Properties (ϕ8) and (ϕ9) have been estab- lished in [75]. Note that the functions ϕlog and ϕhyp have been introduced in [68] as so called modified barrier functions. The parabolic penalty function is defined as • ϕ (t)= 2√1 t +2. par − − The first and second derivatives of this function are ′ − 1 ′′ 1 − 3 ϕ (t)=(1 t) 2 and ϕ (t)= (1 t) 2 . par − par 2 − In analogy to the hyperbolic penalty function one can show that (ϕ0) to (ϕ3), ′ (ϕ4) and (ϕ5) to (ϕ9) are satisfied. I slight disadvantage of this function is that ϕpar does not tend to infinity at the right boundary of its domain. The exponential penalty function is defined as • ϕ (t)= et 1. exp − Again the first and second derivative formulas ′ ′′ t ϕpar(t)= ϕpar(t)= e imply the validity of properties (ϕ0) to (ϕ7). Unfortunately the corresponding matrix function is neither monotone nor convex in the sense of assumptions (ϕ8) and (ϕ9) (see, for example, [45]). Note that in [30] the exponential penalty function is used to construct an alternative algorithm for the solution of linear semidefinite programs. CHAPTER 4. A CLASS OF PENALTY FUNCTIONS 15 Another interesting class of functions was introduced in [6] and further studied • in [7], [64] and [22]. The so called penalty/barrierfunctions are constructed from analytic penalty functions ϕ in the following way: ϕ(t) for t r, ϕ(t)= (4.1) at2 + bt + c for t ≤ r, ≥ where r [ 1, 1] andb a,b and c are real numbers, which can be adjusted so that ϕ is at∈ least− twice continuously differentiable in r. Obviously in this case, the right branch of ϕ is nothing else as the quadratic C2-extrapolation of the left branchb ϕ. Two important advantages of such a choice are that – domϕ = ( b, ), which simplifies penalty parameter initializations and update strategies−∞ ∞ in penalty type methods, – the secondb derivative of ϕ is bounded, which is a very useful property, when Newton type methods are applied. Numerical studies in nonlinearb programming indicated that optimization meth- ods based on penalty/barrier functions as defined in (4.1) are often superior to methods based on classical barrier or modified barrier functions. Moreover it is quite easy to see that any penalty/barrier function ϕˆ constructed from a function ϕ according to (4.1) satisfies the properties (ϕ0) to (ϕ7) if and only if the same properties hold for ϕ. However it can be shown that the matrix function corre- sponding to the right branch of the function ϕ is – as a quadratic matrix function – non-monotone in the sense of definition 2.4 (compare Example 2.2). An interesting fact is that all valid candidates of penaltyb functions we have found so far have two common properties: They are analytic on their full domain and • they have a pole on the positive real half axis. • The following Theorem is a direct consequence of results presented in C. Loewner’s paper on monotone matrix functions [29] and help to interpret these observations: Theorem 4.1 Let (a,b) R. ⊆ a) If Φ is a monotone matrix function on Hm(a,b), then ϕ has at least 2m 3 continuous derivatives. − b) ϕ is operator monotone on Hm(a,b) if and only if ϕ is the restriction of an analytic function defined on the upper complex half plane to the real interval (a,b) . Furthermore each such function can be represented as ∞ 1 u f(z)= αz + β + dµ(u), u z − u2 +1 Z−∞ − where α 0,β R and dµ is a Borel measure that has no mass in (a,b). ≥ ∈ CHAPTER 4. A CLASS OF PENALTY FUNCTIONS 16 An immediate consequence of Theorem 4.1 a) is that any operator monotone function ϕ must be analytic. Moreover if ϕ is such a function, if (a,b)= R and ϕ(x) < for all x R, we conclude from part b) of Theorem 4.1 that ϕ is linear and therefore∞ vio- lates several∈ assumptions required by Definition 4.1. In other words, it is not possible to find a valid penalty function in the sense of Definition 4.1 with the full real line as its domain. From this point of view the functions ϕhyp, ϕlog and ϕpar are “optimal” choices. Remark . Any appropriatelyscaled version and any convex combinationof the func- tions ϕhyp, ϕlog and ϕpar results in a valid penalty function. Remark . The functions ϕhyp, ϕlog and ϕpar are special cases of the following for- mula: t ϕ (t)= δ (1 s)α−1ds + γ , α α − α Z0 where 1 α < 1 and δ and γ are uniquely determined by assumptions (ϕ ) and − ≤ α α 2 (ϕ3). Depending on α we obtain the following classes of penalty functions: 0 <α< 1 : parabolic penalty functions, α = 0 : logarithmic penalty function, 1 α< 0 : hyperbolic penalty functions. − ≤ So far it can be proven that all members of the family of parabolic and at least some members of the family of hyperbolic penalty functions have the same properties as the representatives ϕhyp and ϕpar presented earlier in this section (see, for example, [45]). Chapter 5 A Class of Augmented Lagrangians At the beginning of this chapter we define a class of augmented Lagrangians – the heart of our algorithm. The definition is based on the class of matrix penalty functions introduced in the preceding chapter. We start with some useful notations, which will be used throughout Chapter 5 and 6. Let λ1 . . . λm−r < λm−r+1 = . . . = λm =0 denote the ordered eigenvalues of (x∗) and≤ s ≤,...,s Rm the corresponding eigenvectors. Further define A 1 m ∈ Λ = diag (λ ,...,λ ) Sm,S = (s ,...,s ) Mm,m, • 1 m ∈ 1 m ∈ Λ = diag (λ ,...,λ ) Sr, Λ = diag (λ ,...,λ ) Sm−r, • 0 m−r+1 m ∈ ⊥ 1 m−r ∈ E = (s ,...,s ) Mm,r, E = (s ,...,s ) Mm,m−r and • 0 m−r+1 m ∈ ⊥ 1 m−r ∈ P = E E⊤ Sm, P = E E⊤ Sm. • 0 0 0 ∈ ⊥ ⊥ ⊥ ∈ Note that the columns of E form a basis of Ker( (x∗)), • 0 A the columns of E form a basis of Im ( (x∗)), • ⊥ A (x∗)= SΛS⊤, • A P + P = SD S⊤ + SD S⊤ = I , where • 0 ⊥ 0 ⊥ m D0 = diag(0,..., 0, 1,..., 1), m−r r D⊥ = diag(1|,...,{z 1}, 0|,...,{z 0}). m−r r | {z } | {z } 17 CHAPTER 5. A CLASS OF AUGMENTED LAGRANGIANS 18 using the notation of Definition 2.7, we see that • µ(x∗)−1 ∗ ∗ P0 = Pµ(x∗)(x ) and P⊥ = Pk(x ), kX=1 ∗ ∗ where Pk(x ) for k 1,...,µ(x ) are the Frobenius covariance matrices for the µ(x∗) distinct eigenvalues∈{ of (x}∗). A Next we define a class of augmented Lagrangians for the problem (SDP). Definition 5.1 Given a twice differentiable function f : Rn R, a twice differential n m → m m matrix operator : R S and a penalty function Φp : S S as given in Section 4, we defineA the following→ class of Augmented Lagrangians:→ F : Rn Sm R R Φ × × → (x,U,p) f(x)+ U, Φ ( (x)) . 7→ h p A i The following Theorem summarizes the most important properties of FΦ: Theorem 5.1 The augmented Lagrangian FΦ(x,U,p) has the following properties: ∗ ∗ ∗ a) FΦ(x ,U ,p)= f(x ). b) F ′ (x∗,U ∗,p)= f ′(x∗) + [ U ∗, ]m =0. Φ,x h Aii i=1 ′′ ∗ ∗ ′′ ∗ ∗ ∗ ∗ −1 ∗ ∗ c) FΦ,xx(x ,U ,p) = Lxx(x ,U )+ Hp(x ,U )+ p M, where Hp(x ,U ) ∗ ∗ ∗ ∗ and M are symmetric matrices and Hp(x ,U ) H(x ,U ) for p 0, Ker(M)= (x∗) and y⊤My > 0 for all y / (x∗)→. → C ∈C If f and are convex, then A d) F (x,U,p) is convex in x for all x Ω . Φ ∈ p Remark . Properties5.1 a), b) and d)are shared by the classical Lagrangian function L associated with problem (SDP) and our augmented choice. However, as we will see later, property 5.1 c) is the reason for some advantages of F over L. For simplicity of notation we let F = FΦ in the remainder of this section. For the proof of Theorem 5.1 we make use of the following Lemmas. Lemma 5.2 (von Neumann – Theobald) Let A, B Sm be given. Denote the ordered eigenvalues of A, B by λ (A) λ (A) and∈λ (B) λ (B). Then 1 ≤···≤ m 1 ≤···≤ m m tr(AB) λ (A)λ (B), ≤ i i i=1 X where equality holds if and only if A and B have a simultaneous ordered spectral decomposition. CHAPTER 5. A CLASS OF AUGMENTED LAGRANGIANS 19 Proof . See, for example,[77]. Lemma 5.3 Let A Sm and B Sm be given. Denote the ordered eigenvalues of ∈ + ∈ − A, B by λ1(A) λm(A) and λ1(B) λm(B). Then the following conditions are equivalent:≤ ··· ≤ ≤ ··· ≤ a) A, B =0. h i b) AB =0. c) A and B are simultaneously diagonalizable and λ (A)λ (B)=0 for all i 1,...,m . i i ∈{ } Proof . Let A = P 2 and B = Q2 with P,Q Sm (existence of P and Q follows from the semidefiniteness of A and− B). ∈ ”a) b)” We have ⇒ PQ 2 = PQ,PQ = tr(QP 2Q) = tr(Q2P 2)= Q2, P 2 = B, A =0. k k h i − h i Consequently PQ =0 and AB = P (PQ)Q =0. − ”b) c)” From Lemma 5.2 we get ⇒ m 0 = tr(AB)= A, B λ (A) λ (B) 0. h i≤ i i ≤ i=1 X ≥0 ≤0 Immediately we see λ (A)λ (B)=0 for all i |1{z....,m} | {z .}Moreover we have i i ∈{ } BA = B⊤A⊤ = (AB)⊤ =0= AB and thus A and B are simultaneously diagonalizable. ”c) a)” From the fact that A and B are simultaneously diagonalizable follows the ⇒ existence of an orthonormal matrix S and diagonal matrices ΛA, ΛB such that ⊤ ⊤ A = SΛAS and B = SΛBS . Thus we obtain m A, B = tr SΛ Λ S⊤ = λ (A)λ (B)=0 h i A B i i i=1 X and we have shown A, B =0. h i Lemma 5.4 a) P (x∗)P = (x∗), P (x∗)P = P (x∗)P = P (x∗)P = O . ⊥A ⊥ A 0A 0 ⊥A 0 0A ⊥ m ∗ ∗ ∗ ∗ ∗ b) P⊥U P⊥ = P⊥U P0 = P0U P⊥ = Om, P0U P0 = U . CHAPTER 5. A CLASS OF AUGMENTED LAGRANGIANS 20 Proof . a) Using the spectral decomposition of (x∗), we obtain A P (x∗)P = SD S⊤SΛS⊤SD S⊤ = SD ΛD S⊤ ⊥A ⊥ ⊥ ⊥ ⊥ ⊥ = SΛS⊤ = (x∗). A Exemplary we show ∗ ⊤ ⊤ ⊤ P0 (x )P0 = SD0S SΛS SD0S = SD0ΛD0S A ⊤ = SOmS = Om. b) From the first order optimality conditions, Lemma 5.2 and Lemma 5.3 follows ∗ ⊤ ∗ ∗ ∗ U = SΛU ∗ S , where ΛU ∗ = diag(λ1(U ),...,λm(U )) and 0= λ1(U )= . . . = λ (U ∗) λ (U ∗) ...λ (U ∗). Now we obtain m−r ≤ m−r+1 ≤ m ∗ ⊤ ⊤ ⊤ ⊤ P0U P0 = SD0S SΛU ∗ S SD0S = SD0ΛU ∗ D0S ⊤ ∗ = SΛU ∗ S = U . Exemplary we show ∗ ⊤ ⊤ ⊤ P⊥U P⊥ = SD⊥S SΛU ∗ S SD⊥S = SD⊥ΛU ∗ D⊥S ⊤ = SOmS = Om. Now we are able to prove Theorem 5.1: a) From the first order optimality conditions (3.1) and Lemma 5.3 we obtain λ (U ∗)=0 for all i 1,...,m r . i ∈{ − } By definition of Φp we have ϕ (λ ( (x∗))) = 0 for all i m r+1,...,m . p i A ∈{ − } Now Lemma 5.3 implies U ∗, Φ ( (x∗)) =0 and F (x∗,U ∗,p)= f(x∗). h p A i b) The first equality follows from the fact that U ∗ = DΦ ( (x∗)) [U ∗], (5.1) p A which we will prove below. The second equality follows from (3.1). The follow- CHAPTER 5. A CLASS OF AUGMENTED LAGRANGIANS 21 ing equation completes this part of the proof: DΦ ( (x∗)) [U ∗] p A µ(x∗) ∗ ∗ ∗ = ∆ϕp(λk, λl)Pk(x ) (U ) Pl(x ) k,lX=1 µ(x∗) ∗ ∗ ∗ = ∆ϕp(λk, λl)Pk(x ) (P0U P0) Pl(x ) k,lX=1 µ(x∗) ∗ ∗ ∗ ∗ ∗ = ∆ϕp(λk, λl)Pk(x )Pµ(x∗)(x )U Pµ(x∗)(x )Pl(x ) k,l=1 X ∗ ∗ ∗ = ∆ϕp(λµ(x∗), λµ(x∗))Pµ(x∗)(x )U Pµ(x∗)(x ) ′ ∗ ∗ = ϕp(0)P0U P0 = U . c) Taking into account Theorem 2.3 and Lemma 5.4 and denoting 2 ∂ ∗ i,j = (x ) for all i, j =1,...,n, A ∂xi∂xj A we have ′′ ∗ ∗ Fxx(x ,U ,p) ∂2 n = f ′′(x∗)+ P U ∗P , Φ ( (x∗)) 0 0 ∂x ∂x p A i j i,j=1 µ(x∗) ′′ ∗ ∗ ∗ ∗ = f (x )+ U , P0 ∆ϕp(λk, λl)Pk(x ) i,j Pl(x ) P0 + "* A + k,lX=1 µ(x∗) ∗ 2 ∗ ∗ ∗ U , P0∆ ϕp(λk, λl, λs)Pk(x ) iPl(x ) j Ps(x )P0 + * A A + k,l,sX=1 µ(x∗) n ∗ 2 ∗ ∗ ∗ U , P0∆ ϕp(λk, λl, λs)Pk(x ) j Pl(x ) iPs(x )P0 * A A + # k,l,sX=1 i,j=1 = f ′′(x∗)+ ∗ ∗ ∗ n U , ∆ϕ (λ ∗ , λ ∗ )P ∗ (x ) P ∗ (x ) + p µ(x ) µ(x ) µ(x ) Ai,j µ(x ) i,j=1 ∗ ∗ ∗ n ∗ ∗ U , Pµ(x )(x )Ni,j Pµ(x )(x ) i,j=1 + ∗ ∗ ∗ n ∗ ∗ U , Pµ(x )(x )Nj,iPµ(x )(x ) i,j=1 n = f ′′(x∗)+ U ∗, P ϕ′ (0) P + 2 [ U ∗,N ]n , 0 p Ai,j 0 i,j=1 h i,j i i,j=1 ′′ ∗ ∗ Lxx(x ,U ) | {z } CHAPTER 5. A CLASS OF AUGMENTED LAGRANGIANS 22 where µ(x∗) N = ∆2ϕ (λ , 0, 0)P (x∗) for all i, j =1,...,n. i,j Ai p k k Aj kX=1 2 Now we calculate ∆ ϕp(λk, 0, 0). We consider two cases: k = µ(x∗): ∆2ϕ (λ , 0, 0)=∆2ϕ (0, 0, 0) = p−1ϕ′′(0) for p 0. p k p → ∞ → k < µ(x∗): pϕ(λk /p) 1 2 λk 1 ∆ ϕp(λk, 0, 0) = − for p 0. (5.2) λk →−λk → The latter limit follows from λk < 0 and the properties of ϕ. Defining ∗ m H (x∗,U ∗)=2 U ∗, µ(x )−1 ∆2ϕ (λ , 0, 0)P (x∗) (5.3) p Ai k=1 p k k Aj i,j=1 and M = 2 [ U ∗ , ϕ′′(0) PP (x∗) ]m we see that h Ai 0 Aj i i,j=1 ′′ ∗ ∗ ′′ ∗ ∗ ∗ ∗ −1 Fxx(x ,U ,p)= Lxx(x ,U )+ Hp(x ,U )+ p M, where m−r m ∗ ∗ ∗ 1 ⊤ ∗ ∗ Hp(x ,U ) 2 U , i sksk j = H(x ,U ) →− "* A λk ! A +# Xk=1 i,j=1 for p 0. To complete the proof, we will show next that → Ker(M)= (x∗). C This can be seen from M = 2ϕ′′(0) [ U ∗, P (x∗) ]m h Ai 0 Aj i i,j=1 m m ′′ ∗ ⊤ = 2ϕ (0) U , i sksk j "* A ! A +# k=mX−r+1 i,j=1 m m ′′ ⊤ ⊤ = 2ϕ (0) SΛU ∗ S , i sksk j "* A ! A +# k=mX−r+1 i,j=1 m m m ′′ ⊤ ⊤ = 2ϕ (0) ulsl i sksk j sl " A ! A # l=mX−r+1 k=mX−r+1 i,j=1 m m = 2ϕ′′(0) u s⊤ s s⊤ s l l Ai k k Aj l k,l=m−r+1 X i,j=1 ′′ ∗ ⊤ = 2ϕ (0)BUr B , CHAPTER 5. A CLASS OF AUGMENTED LAGRANGIANS 23 ∗ Sr2 where Ur = diag(um−r+1,...,um,...,um−r+1,...,um) and the i-th 2 ∈ row of B Mn,r is defined by ∈ b = ( s⊤ s ,...,s⊤ s ,..., i m−r+1Ai m−r+1 mAi m−r+1 s⊤ s ,...,s⊤ s )⊤. m−r+1Ai m mAi m ∗ Sr From the strict complementarity condition follows that Ur ++ and by Lemma 3.1 we obtain Ker(M)= (x∗) and the claim that y⊤My >∈ 0 for all y / (x∗). C ∈C d) We have to show that U, Φp ( (x)) is convex for all p > 0, U 0 and x n h A i ∈ Ωp = x R (x) bpIm . Given λ [0; 1] and x, y Ωp, the convexity of assures{ ∈ |A } ∈ ∈ A λ (x)+(1 λ) (y) (λx + (1 λ)y) 0. (5.4) A − A − A − By the monotonicity of Φp we get Φ (λ (x)+(1 λ) (y)) Φ ( (λx + (1 λ)y)) 0. p A − A − p A − The latter inequality combined with the convexity of Φp show Φ (λ (x)) + (1 λ)Φ ( (y)) Φ ( (λx + (1 λ)y)) 0. p A − p A − p A − Since U 0, it follows that tr U ⊤ [Φ (λ (x)) + (1 λ)Φ ( (y)) Φ ( (λx + (1 λ)y))] 0. p A − p A − p A − ≥ Finally, the latter inequality and the linearity of tr( ) imply the convexity of U, Φ ( (x)) . · h p A i The following Corollary points out two important advantages of F over the classical Lagrangian L: Corollary 5.5 ∗ a) There exists p0 > 0 such that F (x, U ,p) is strongly convex for all p b) There exist constants ǫ> 0 and p0 > 0 such that x∗ = argmin F (x, U ∗,p) x R, x x∗ ǫ p 0. { | ∈ } ∀ In other words, if the optimal Lagrangian multiplier U ∗ is known, problem (SDP ) can be solved by solving one smooth optimization problem. CHAPTER 5. A CLASS OF AUGMENTED LAGRANGIANS 24 Proof . a) Below we will show that there exists γ > 0 such that y⊤F ′′(x∗,U ∗,p)y>γ for all y R with y =1, whenever p is small enough. By Theorem 5.1 we know that ∈ k k ′′ ∗ ∗ ′′ ∗ ∗ ∗ ∗ −1 Fxx(x ,U ,p)= Lxx(x ,U )+ Hp(x ,U )+ p M, where H (x∗,U ∗) H(x∗,U ∗) for p 0 (5.5) p → → and Ker(M)= (x∗), y⊤My > 0 for all y / (x∗). (5.6) C ∈C We consider two cases: y (x∗): From the second order optimality conditions for (SDP ) it follows ∈C that there exists γ > 0 such that y⊤(L′′ (x∗,U ∗)+ H(x∗,U ∗))y>γ for all y (x∗) with y =1. (5.7) xx ∈C k k Now, (5.5) implies ⊤ ′′ ∗ ∗ ⊤ ′′ ∗ ∗ ∗ ∗ y Fxx(x ,U ,p)y = y (Lxx(x ,U )+ Hp(x ,U )) y>γ for p small enough. y / (x∗): From (5.6) we see that ∈C y⊤p−1My for p 0. → ∞ → On the other hand ⊤ ′′ ∗ ∗ ∗ ∗ y (Lxx(x ,U )+ Hp(x ,U )) y is bounded for all p p for a p R small enough. Thus we obtain ≤ 0 0 ∈ ⊤ ′′ ∗ ∗ y Fxx(x ,U ,p)y>γ for p small enough. b) is a direct consequence of a) and Theorem 5.1. Remark . Note that the assertion of Corollary 5.5 a) is generally wrong for the clas- sical Lagrangian L. Even for convex problem data f and strong convexity of L may not hold. Moreover it is a well known fact that the charactA erization in part b) of Corollary 5.5 may fail for the classical Lagrangian (see, for example, [71]). In case of the augmented Lagrangian, Corollary 5.5 a) guarantees that the last assertion of Corol- lary 5.5 b) holds even for non-convex f and , provided p is small enough and the optimization is started close enough to x∗ A Chapter 6 A Locally Convergent Augmented Lagrangian Algorithm At the beginningof this chapterwe presenta basic algorithm for the solution of problem (SDP). Then, in the main part, we deal with the (local) convergence properties of this algorithm. 6.1 Basic Algorithm On the basis of Definition 5.1 we define the following algorithm: 0 Rn 0 Sm 0 Algorithm 6.1.1 Let x ,U ++ and p > 0 be given. Then for k = 0, 1, 2,... repeat till a stopping∈ criterium∈ is reached: (i) xk+1 = arg min F (x, U k,pk) x∈Rn (ii) U k+1 = DΦ (xk+1) U k p A (iii) pk+1 pk. ≤ Obviously Algorithm 6.1.1 consists of three steps, an unconstrained minimization step and two update formulas: In the first step we calculate the global minimum of F with respect to x, where • the multiplier and the penalty parameter are kept constant. Of course, to find the global minimum of a generally non-convex function is a difficult task in practice, and in the worst case one may not be able to solve a single problem of type 6.1.1(i). We will return to this point at the end of Section 6, where we will show, how Algorithm 6.1.1 can be adapted for local minima. 25 CHAPTER 6. A LOCALLY CONVERGENT AUGM. LAGR. ALGORITHM 26 The second step of Algorithm 6.1.1 is the multiplier update formula. The mul- • tiplier update formula can be interpreted as the directional derivative of Φp in direction of the current multiplier, evaluated at (xk+1). A The third step describes the penalty parameter update formula, which is given • in the most general form here. A possible update scheme will be developed in Section 9.1. For a more detailed discussion on the particular steps of Algorithm 6.1.1 we refer to Section 9. 6.2 Local Convergence Properties We start with the analysis of the local convergence properties of Algorithm 6.1.1. In particular we are interested in the behavior of the algorithm, when it is started with a penalty parameter p0, which is small enough and an initial multiplier matrix U0, which is close enough to U ∗. These relations are formalized by the following definition: ∗ ∗ ∗ Definition 6.1 Let λ1(U ),...,λm(U ) denote the eigenvalues of U in increasing ∗ ∗ order. Let further 0 <ǫ<λm−r+1(U ) and λm(U ) < Θ be given. Then we define (U ∗,p , δ, ǫ, Θ) = (U,p) Sm R : U U ∗ δp−1,p (G2) Then we prove the estimate max xˆ x∗ , U U ∗ Cp U U ∗ (6.1) k − k k − k ≤ k − k n o for the pair xˆ and b U = U(U,p)= DΦ ( (ˆx(U,p))) [U] , p A where C is a constantb whichb is independent of p. In other words we demonstrate that for appropriately chosen parameters p0, δ, ǫ, Θ and for p U = P0UP0 + P⊥UP0 + P0UPb⊥ +bP⊥UP⊥ . (6.2) Note that the matrixb P0UPb0 is the orthogonalb projectionb of bU onto the null space of (x∗). Motivated by this fact we define the matrices A b b Uact = P0UP0, Uinact = P⊥UP0 + P0UP⊥ + P⊥UP⊥. b b Next we introduce a variable transformation T = p(U U ∗) and matrices b b b − b U = E⊤UE Sr, U ∗ = E⊤U ∗E Sr. 0 0 0 ∈ 0 0 0 ∈ n m m Furthermore we defineb the mappingb U⊥ : R S R S by × × → U (x,T,p) = P DΦ ( (x)) p−1T + U ∗ P + ⊥ ⊥b p A ⊥ −1 ∗ P0DΦp ( (x)) p T + U P⊥ + b A P DΦ ( (x)) p−1T + U ∗ P . (6.3) ⊥ p A 0 CHAPTER 6. A LOCALLY CONVERGENT AUGM. LAGR. ALGORITHM 28 Using these definitions we observe ⊤ Uact = E0U0E0 and (6.4) ∗ Uinact = U⊥(ˆx,p(U U ),p). (6.5) b b − We will further make use of the mapping h : Rn Sm R Rn defined by b b × × → n ′ h(x,T,p)= U⊥(x,T,p), i(x) . (6.6) A i=1 hD Ei Next we define local neighborhoods b S(OSm ,δ)= T : T δ , • { k k≤ } S(U ∗,ǫ )= V Sr : V U ∗ ǫ and • 0 0 { ∈ k − 0 k≤ 0} S(x∗,ǫ )= x Rn : x x∗ ǫ for given ǫ > 0, • 0 { ∈ k − k≤ 0} 0 and, using these neighborhoods, the following pair of mappings: ∗ ∗ Rn Ψ1 : S(x ,ǫ0) S(U0 ,ǫ0) S(0,δ) (0, ) × × × ∞ → n ′ ⊤ ⊤ ′ (x, U0,T,p) f (x) + E0U0E0 , i(x) + h(x,T,p), (6.7) 7→ A i=1 Ψ : S(x∗,ǫ ) S(U ∗,ǫ ) ShD(0,δ) (0, ) EiSr 2 b 0 × 0 0 × b× ∞ → (x, U ,T,p) pE⊤ DΦ ( (x)) [p−1T + U ∗] E pU . (6.8) 0 7→ 0 p A 0 − 0 A frequently used isometry is provided by the following defin ition: b b Definition 6.2 Given a symmetric matrix A Sm we define the operator svec : Sm Rm(m+1)/2 as ∈ → svec(A) = (a , √2a ,a , √2a , √2a ,a ,...)⊤ Rm(m+1)/2. 11 12 22 13 23 33 ∈ Further we define the operator smat : Rm(m+1)/2 Sm as the inverse of svec. → ∗ ∗ Using definition 6.2 we define r˜ = r(r+1)/2, m˜ = m(m+1)/2, u0 = svec(U0 ), the neighborhoods S(u∗,ǫ )= uˆ Rr˜ : uˆ u∗ ǫ 0 0 { ∈ k − 0k≤ 0} and m˜ S(ORm˜ ,δ )= t R : t δ . 0 { ∈ k k≤ 0} and a mapping ∗ ∗ n+˜r Ψ : S(x ,ǫ ) S(u ,ǫ ) S(0Rm˜ ,δ ) (0, ) R 0 × 0 0 × 0 × ∞ → by Ψ(x, uˆ0,t,p) = Ψ1(x, smat(ˆu0), smat(t),p), svec (Ψ2(x, smat(ˆu0), smat(t),p)) . (6.9) Now we are prepared to start with the proof of assertion (G1). The idea is to apply the following implicit function theorem, which is a slightly modified version of the Implicit Function Theorem 2 presented in [12, p. 12]. CHAPTER 6. A LOCALLY CONVERGENT AUGM. LAGR. ALGORITHM 29 Theorem 6.1 (Implicit Function Theorem) Let S be an open subset of Rm+n, X¯ be a compact subset of Rm, and h : S Rn be a function such that h C1 on S. Assume that h(x, y) exists and is continuous→ on S. Assume y¯ Rn is a∈ vector such that the ∇y ∈ matrix yh(¯x, y¯) is nonsingular for all x¯ X¯. Then there exist scalars ǫ> 0,δ> 0, and a function∇ φ : S(X¯; ǫ) S(¯y; δ) such∈ that φ C1 on S(X¯; ǫ), y¯ = φ(¯x) for all x¯ X¯, and h[x, φ(x)] =0→for all x S(X¯; ǫ). The∈ function φ is unique in the sense that∈ if x S(X¯; ǫ), y S(X¯; δ) and∈h(x, y)=0, then y = φ(x). ∈ ∈ The following table identifies functions and sets in Theorem 6.1 with the corresponding functions and sets in the notation used in this thesis: Theorem6.1 our notation m mˆ +1 n n +ˆr m+n ∗ ∗ (n+ˆr)+(m ˆ +1) S R S(x ,ǫ ) S(u ,ǫ ) S(0Rm˜ ,δ ) (0, ) R ⊂ 0 × 0 0 × 0 × ∞ ⊂ X¯ = ORm˜ I, where I R+ is a compact interval K ∗ {∗ }× ⊂ y¯ (x ,u0) h Ψ In order to satisfy all assumptions of Theorem 6.1 we have to show that Ψ is continuous with respect to all variables, • Ψ(x∗,u∗, 0,p)=0, • 0 Ψ is continuously differentiable with respect to x and uˆ and • 0 Ψ′ (x∗,u∗, 0,p) is nonsingular for all p small enough. • x,uˆ0 0 We start with discussing some basic properties of h, Ψ1 and Ψ2. Lemma 6.2 If is a twice continuously differentiable operator, then the function h defined in (6.6)A is continuously differentiable with respect to x and a) h(x∗, 0,p)=0. ′ ∗ ∗ ∗ ′ ∗ ∗ ∗ b) hx(x , 0,p) = Hp(x ,U ) and hx(x , 0,p) H(x ,U ) for p 0, where H(x∗,U ∗) is defined by formula (3.2). → → Proof . The differentiability of h is obvious. a) Taking into account (5.1) and Lemma 5.4 we obtain ∗ ∗ ∗ U⊥(x , 0,p) = P⊥DΦp ( (x )) [U ] P⊥ A ∗ ∗ ∗ ∗ +P0DΦp ( (x )) [U ] P⊥ + P⊥DΦp ( (x )) [U ] P0 b ∗ A ∗ ∗ A = P⊥U P⊥ + P0U P⊥ + P⊥U P0 =0. (6.10) Thus we obtain h(x∗, 0,p)=0. CHAPTER 6. A LOCALLY CONVERGENT AUGM. LAGR. ALGORITHM 30 b) Using Corollary 2.4 we get for any matrix B Sm ∈ ∂ 2 ′ DΦp ( (x)) [B]= D Φp ( (x)) [B; i(x)] ∂xi A A A µ(x) = ∆2ϕ (λ , λ , λ )P (x) ′ (x)P (x)BP (x)+ p j k l j Ai k l j,k,lX=1 µ(x) ∆2ϕ (λ , λ , λ )P (x)BP (x) ′ (x)P (x). p j k l j k Ai l j,k,lX=1 Now, performing the same steps as in the proof of Theorem 5.1 c), we easily conclude ∂ ∗ ∗ 2 ∗ ∗ DΦp ( (x )) [U ] = D Φp ( (x )) [U ; i] ∂xi A A A µ(x∗) = ∆2ϕ (λ , 0, 0)P (x∗) U ∗ + p k k Ai Xk=1 µ(x∗) ∆2ϕ (λ , 0, 0)U ∗ P (x∗). p k Ai k Xk=1 ∗ ∗ Considering U P⊥ = P⊥U =0 we obtain µ(x∗)−1 U ′ (x∗, 0,p) = ∆2ϕ (λ , 0, 0)P (x∗) U ∗ + ⊥,x p k k Ai h kX=1 µ(x∗)−1 b n 2 ∗ ∗ ∆ ϕp(λk, 0, 0)U iPk(x ) A i=1 kX=1 i and using (6.10) n µ(x∗)−1 h′ (x∗, 0,p)=2 U ∗, ∆2ϕ (λ , 0, 0)P (x∗) x Ai p k k Aj * k=1 + X i,j=1. ∗ ∗ The latter matrix is equal to the matrix Hp(x ,U ) defined in the proof of The- orem 5.1. Thus we can show that the right hand side of the latter equation con- verges to H(x∗,U ∗) as p tends to 0 using exactly the same arguments as in the proof of Theorem 5.1. CHAPTER 6. A LOCALLY CONVERGENT AUGM. LAGR. ALGORITHM 31 Lemma 6.3 ∗ ∗ ∗ ∗ Ψ1(x ,U0 , 0,p)=Ψ2(x ,U0 , 0,p)=0 for all p> 0. Proof . Using (5.1)we can show that n Ψ (x∗,U ∗, 0,p) = f ′(x∗)⊤ + E E⊤U ∗E E⊤, + h(x∗, 0,p) 1 0 0 0 0 0 Ai i=1 ′ ∗ ∗ = Lx(x ,U )=0 , Ψ (x∗,U ∗, 0,p) = pE⊤ (DΦ ( (x)) [U ∗]) E pE⊤U ∗E 2 0 0 p A 0 − 0 0 = pE⊤U ∗E pE⊤U ∗E =0. 0 0 − 0 0 for all p> 0. As a direct consequence of Lemma 6.3 we obtain ∗ ∗ Ψ(x ,u0, 0,p)=0 for all p> 0. (6.11) Next, we investigate the differentiability of Ψ1 and Ψ2 and give formulas for partial derivatives. Lemma 6.4 Let and f be twice continuously differentiable. Then the functions Ψ A 1 and Ψ2 defined in (6.7) and (6.8) are twice continuously differentiable and the follow- ing formulas for the derivatives of Ψ1 and Ψ2 hold true: n ′ ′′ ⊤ ′′ ′ Ψ1,x(x, U0,T,p) = f (x)+ E0U0E0 , i,j (x) + hx(x,T,p), A i,j=1 µ(x) hD Ei b b Ψ′ (x, U ,T,p) = p ∆2(λ (x), λ (x), λ (x)) 2,x 0 j k l · h j,k,lX=1 b E⊤ P (x) ′ (x)P (x)(p−1T + U ∗)P (x)+ 0 j Ai k l n −1 ∗ ′ Pj (x)(p T + U )Pk(x) i(x)Pl(x) E0 A i=1 n ⊤ 2 −1 ∗ ′ i = p E0 D Φp( (x))[p T + U ; i(x)]E0 , A A i=1 ′ h ⊤ ′ n i Ψ b (x, U0,T,p) = E0 i(x)E0 , 1,U0 A i=1 ′ where for all Ψ b (x, U0,T,p) = pE, Ei,j =1 i, j 1,...,r . 2,U0 b − ∈{ } Proof . Theb first and third formula follow easily from the linearity of the trace opera- tor. For the second formula we only note that ∂ 2 ′ DΦp ( (x)) [B]= D Φp ( (x)) [B; i(x)] ∂xi A A A for any matrix B Sm. The last equation can be seen directly. ∈ CHAPTER 6. A LOCALLY CONVERGENT AUGM. LAGR. ALGORITHM 32 Corollary 6.5 ′ ∗ ∗ ′′ ∗ ∗ ∗ ∗ Ψ1,x(x ,U0 , 0,p)= L (x ,U )+ Hp(x ,U ), n ′ ∗ ∗ ′′ ⊤ ∗ ∗ Ψ2,x(x ,U0 , 0,p)= ϕ (0) E0 ( iU + U i) E0 , A A i=1 n Ψ′ (x∗,U ∗, 0,p)= E⊤ h E , i b 0 0 i 0 1,U0 A i=1 Ψ′ (x∗,U ∗, 0,p)= pE, where [E] =1 for all i, j 1,...,r . b 0 i,j 2,U0 − ∈{ } Proof . The first formula can be seen using Lemma 6.2 b). The second formula fol- lows from Lemma 6.4 using the same argumentsas in the proof of Theorem 5.1 c). The remaining formulas can be directly seen from Lemma 6.4. In the following Lemma and Corollary we derive the partial derivative of Ψ with re- spect to x and uˆ0. Lemma 6.6 Let and f be twice continuously differentiable. Then the function Ψ A defined in (6.9) is continuously differentiable with respect to x, uˆ0 and t and the fol- lowing formula holds: Ψ′ (x, uˆ ,t,p)= x,uˆ0 0 n ′′ ⊤ ′′ ′ ⊤ ′ n f (x)+ E0U0E0 , i,j (x) + hx(x,T,p) svec E0 i(x)E0 A i,j=1 A i=1 ⊤ p E⊤DhDΦ ( (x)[ ′ (x); pEi−1T + U ∗])E pI 0 2 p bA Ai 0 r˜ . Proof . Lemma 6.6 is a direct consequence of Lemma 6.4 and [76], where formulas for the derivatives of the functions svec and smat are presented. Corollary 6.7 ′ ′ ∗ ∗ Ψ(p) =Ψx,uˆ0 (x ,u0, 0,p)= n L′′(x∗,U ∗)+ H (x∗,U ∗) svec E⊤ E p 0 Ai 0 i=1 ′′ ⊤ ∗ ∗ n ⊤ svec ϕ (0)E0 ( iU + U i) E0 pIr˜ ! A A i=1 . Proof . Corollary 6.7 follows directly from Lemma 6.6 and Corollary 6.5. Along with Ψ′ (x∗,u∗, 0,p) we define x,uˆ0 0 ′ ′ ∗ ∗ Ψ = lim Ψ ˆ (x ,u0, 0,p)= (6.12) (0) p→∞ x,u0 n L′′(x∗,U ∗)+ H(x∗,U ∗) svec E⊤ E 0 Ai 0 i=1 ′′ ⊤ ∗ ∗ n ⊤ svec ϕ (0)E ( iU +U i) E0 0 0 A A i=1 . ′ The following Lemma and Corollary provide information about the regularity of Ψ(0) ′ and Ψ(p). CHAPTER 6. A LOCALLY CONVERGENT AUGM. LAGR. ALGORITHM 33 ′ Lemma 6.8 Ψ(0) is nonsingular. Proof . Foranypair (y, v) Rn+˜r with Ψ′ (y, v)=0 the following equations hold: ∈ (0) n (L′′(x∗,U ∗)+ H(x∗,U ∗))y + svec E⊤ E v = 0 (6.13) 0 Ai 0 i=1 n ⊤ svec ϕ′′(0)E⊤ ( U∗ + U ∗ ) E y = 0. (6.14) 0 Ai Ai 0 i=1 ∗ ∗ ∗ Now let λ1(U ),...,λm(U ) denote the increasingly ordered eigenvalues of U , de- fine ∗ ∗ ∗ Sr and recall that ∗ ΛU0 = diag(λm−r+1(U ),...,λm(U )) ++ λ1(U ) = . . . = ∗ ∈ λm−r(U )=0 and E0 = (sm−r+1,...,sm). Then we conclude m ∗ ∗ ⊤ ∗ ⊤ U = λk(U )sksk = E0ΛU0 E0 k=mX−r+1 and n ⊤ ∗ ⊤ ⊤ ∗ ⊤ (6.14) yi E i E0Λ E E0 + E E0Λ E iE0 =0 ⇔ 0 A U0 0 0 U0 0 A i=1 Xn y E⊤ E Λ∗ +Λ∗ E⊤ E =0 ⇔ i 0 Ai 0 U0 U0 0 Ai 0 i=1 Xn m y s⊤ s λ (U ∗)+ λ (U ∗)s⊤ s =0 ⇔ i k Ai l l k k Ai l k,l=m−r+1 i=1 Xn m y (λ (U ∗)+ λ (U ∗))s⊤ s =0 ⇔ i k l k Ai l k,l=m−r+1 i=1 X n m [λ (U ∗)+ λ (U ∗)]m y s⊤ s =0 ⇔ k l k,l=m−r+1 • i k Ai l k,l=m−r+1 i=1 ! n X m y s⊤ s =0 ⇔ i k Ai l k,l=m−r+1 i=1 X n y⊤ svec E⊤ E =0. (6.15) ⇔ 0 Ai 0 i=1 Hence, multiplying (6.13) by y⊤ from left we get y⊤(L′′(x∗,U ∗)+ H(x∗,U ∗))y =0. Now from the second order optimality conditions for (SDP ) follows either y / (x∗) or y = 0. Therefore from (6.15) and Lemma 3.1 we conclude y = 0. Finally∈C (6.13) ′ together with assumption (A3) show v =0 and therefore Ψ(0) is nonsingular. CHAPTER 6. A LOCALLY CONVERGENT AUGM. LAGR. ALGORITHM 34 ′−1 Corollary 6.9 There exist p0 > 0 and ρ > 0 such that Ψ(p) < ρ for all p a) there exists γ > 0 such that Ψ′−1 γ and 0 k (0) k≤ 0 b) there exists µ > 0 such that Ψ′ w 2 > µ w 2 for all w Rn+˜r. 0 k (0) k 0k k ∈ ′ From b) and the continuity of Ψ(p) with respect to p, which can be easily derived from Theorem 5.1, it follows that we find p0 > 0 such that µ Ψ′ w 2 > 0 w 2 for all w Rn+˜r and all p p . k (p) k 2 k k ∈ ≤ 0 ′ µ ′ Consequently the smallest eigenvalue of Ψ(p) is larger than 2 , Ψ(p) is nonsingular and there exists ρ> 0 independent of p such that Ψ′−1 <ρ for all p Proposition 6.10 There exist δ > 0 and p0 > 0 small enough such that for given ∗ ∗ ∗ Θ > λm(U ), any 0 <ǫ<λm−r+1(U ) and any (U,p) (U ,p0, δ, ǫ, Θ) there ′ ∈ V exists a vector xˆ =x ˆ(U,p) such that the equation Fx(ˆx,U,p)=0 holds. Proof . We have shown that we find p0 > 0 such that Ψ(x∗,u∗, 0,p)=0, • 0 Ψ is continuously differentiable with respect to x and uˆ and • 0 Ψ′ is nonsingular • (p) for all 0 0 and smooth functions x(t,p) and uˆ0(t,p) defined uniquely in a neighborhood ( ,δ)= (t,p) Rm˜ +1 : t δ, p [p ,p ] S K ∈ k k≤ ∈ 1 0 of the compact set such that K Ψ(x(t,p), uˆ (t,p),t,p)=0 for all (t,p) ( ,δ ) and • 0 ∈ S K 0 x(0,p)= x∗, uˆ (0,p)=ˆu∗ for any p [p ,p ]. • 0 0 ∈ 1 0 Recalling that smat(t) = T = p(U U ∗) and v = smat(v) we conclude svec(p(U U ∗)) S( ,δ) for any (U,p−) (U ∗,pk k, δ, ǫ, Θ)k . Thus k − ∈ K ∈V 0 xˆ =x ˆ(U,p)= x( svec(p(U U ∗)),p) − and U = U (U,p)= smat(ˆu ( svec(p(U U ∗)),p)). 0 0 0 − b b CHAPTER 6. A LOCALLY CONVERGENT AUGM. LAGR. ALGORITHM 35 exist and the following equations hold true: n ′ ⊤ ⊤ ′ ∗ f (ˆx) + E0U0E0 , i(ˆx) + h(ˆx,p(U U ),p) = 0 (6.16) A i=1 − hD pE⊤ (EiDΦ ( (ˆx)) [U]) E pU = 0. (6.17) b 0 p A 0 − 0 ⊤ From the latter equation we see that U0 = E0 (DΦp ( (ˆx)) [U])bE0 and after substi- A tution of U0 in (6.16) we obtain b ′ ⊤ ′ n b 0 = f (ˆx) + [ P0 (DΦp ( (ˆx)) [U]) P0, i(ˆx) ]i=1 h A nA i ∗ ′ + U⊥(ˆx, p(U U ),p), i(ˆx) − A i=1 n ′ hD ⊤ ′ Ei = f (ˆx)b + DΦp ( (ˆx)) [U], i(ˆx) h A A i i=1 n ′ ⊤ h ′ i = f (ˆx) + U,DΦp ( (ˆx)) [ i(ˆx)] h A A i i=1 ′ h i = Fx(ˆx,U,p). This completes the proof of Proposition 6.10. Later we will show that xˆ = argminx∈RF (x,U,p) and therefore complete the proof of assertion (G1). Next we will prove assertion (G2). We start with the following Lemma. Lemma 6.11 There exist δ > 0 and p0 > 0 small enough such that for given Θ > λ (U ∗), any 0 <ǫ<λ (U ∗) and any (U,p) (U ∗,p , δ, ǫ, Θ) the estimate m m−r+1 ∈V 0 max xˆ x∗ , E⊤(U U ∗)E Cp U U ∗ k − k k 0 − 0k ≤ k − k n o holds, where C is a constant independentb of p. Proof . We start by rewriting equations (6.16) and (6.17) using T = smat(t) and U˜0(t,p)= smat(ˆu0(t,p)): n ′ ⊤ ˜ ⊤ ′ f (x(t,p)) + E0U0(t,p)E0 , i(x(t,p)) + h(x(t,p),T,p) = 0 (6.18) A i=1 p svec EhD⊤ DΦ ( (x(t,p))) [p−1T +EiU ∗] E puˆ (t,p) = 0. (6.19) 0 p A 0 − 0 Now, differentiating identity (6.18) with respect to t we get n ′′ ′ ˜ ⊤ ′′ ′ f (x(t,p)) xt(t,p)+ E0U0(t,p)E0 , i,j (x(t,p)) xt(t,p)+ · A i,j=1· n ⊤ ′ hD ⊤ ′ ′ Ei svec E0 i(x(t,p))E0 uˆ0,t(t,p)+ ht(x(t,p),T,p)=0, (6.20) A i=1· h i where ∂ x′ (t,p)= (x (t,p), j =1,...,n) Rn,m˜ t ∂t j ∈ and ∂ uˆ′ (t,p)= (ˆu (t,p), j =1,..., r˜) Rr,˜ m˜ . 0,t ∂t 0,j ∈ CHAPTER 6. A LOCALLY CONVERGENT AUGM. LAGR. ALGORITHM 36 Differentiation of (6.19) with respect to t yields: ′ puˆ0,t(t,p) ∂ = p svec E⊤ DΦ (x(t,p)) p−1T + U ∗ E ∂t 0 pA 0 n ⊤ 2 −1 ∗ ′ ⊤ ′ = p svec E0 D Φp (x(t,p)) p T + U , i(x(t,p)) E0 xt(t,p) A A i=1· h m˜ ⊤ i ⊤ ⊤ + svec E0 (DΦp (x(t,p)) [ smat(ej )]) E0 , (6.21) A j=1 h i where e denotes the j th unit vector in Rmˆ . Next we calculate h′ (x(t,p),T,p): j − t ′ ht(x(t,p),T,p) n ∂ ′ = U⊥(x(t,p),T,p), i(x(t,p)) ∂t A i=1 hD ∂ Ei = P b′ (x(t,p))P , DΦ (x(t,p))[p−1T + U ∗] + ⊥Ai ⊥ ∂t pA j (i,j)∈I×J ∂ P ′ (x(t,p))P , DΦ (x(t,p))[p−1T + U ∗] + 0Ai ⊥ ∂t pA j (i,j)∈I×J ∂ P ′ (x(t,p))P , DΦ (x(t,p))[p−1T + U ∗] + ⊥Ai 0 ∂t pA j (i,j)∈I×J n ′′ ′ U⊥(x(t,p),T,p), i,j (x(t,p)) xt(t,p), (6.22) A i,j=1· hD Ei where I J = b(i, j) : i =1,...,n and j =1,..., mˆ and × { } ∂ −1 ∗ DΦp (x(t,p))[p T + U ] ∂tj A ⊤ n ∂ smat svec 2 −1 ∗ ′ = ( D Φp (x(t,p)) p T + U , i(x(t,p)) i=1 x(t,p) + A A · ∂tj 1 DΦ (x(t,p)) [ smat(e )] p pA j 2 −1 ∗ ′ for all j = 1,..., mˆ . Using Bj (t,p) = D Φp (x(t,p)) p T + U , j (x(t,p)) for j =1,...,n, recalling that A A n ′ ′ h (x(t,p),t,p) = P⊥Ai(x(t,p))P⊥,Bj (t,p) + i,j=1 hD Ein ′ P0Ai(x(t,p))P⊥,Bj (t,p) + i,j=1 hD Ein ′ P⊥Ai(x(t,p))P0,Bj (t,p) + i,j=1 n hD ′′ Ei U⊥(x(t,p),T,p), i,j (x(t,p)) A i,j=1 hD Ei b CHAPTER 6. A LOCALLY CONVERGENT AUGM. LAGR. ALGORITHM 37 and defining 1 M(t,p) = [ P A′ (x(t,p))P ,DΦ (x(t,p)) [ smat(e )] ] p h ⊥ i ⊥ pA j i (i,j)∈I×J 1 + [ P A′ (x(t,p))P ,DΦ (x(t,p)) [ smat(e )] ] p h 0 i ⊥ pA j i (i,j)∈I×J 1 + [ P A′ (x(t,p))P ,DΦ (x(t,p)) [ smat(e )] ] p h ⊥ i 0 pA j i (i,j)∈I×J from (6.22) we obtain h′ (x(t,p),t,p)= h′(x(t,p),t,p) x′ (t,p)+ M(t,p). (6.23) t · t Now defining m˜ N(t,p)= svec E⊤ (DΦ (x(t,p)) [ smat(e )]) E 0 pA j 0 j=1 and combining Lemma 6.6, (6.20), (6.21) and (6.23) we obtain the system ′ ′ xt(t,p) M(t,p) Ψ (x(t,p), uˆ0(t,p),t,p) = . (6.24) x,uˆ0 uˆ′ (t,p) − N(t,p) 0,t Next we consider the special case t =0. From (6.23) and Lemma 6.2 b) we see that h′ (x(0,p), 0,p)= H (x∗,U ∗) x′ (0,p)+ M , (6.25) t p · t 0 where M0 = M(0,p) with 1 ∗ M0,i,j = svec DΦp( (x )) P0 iP⊥ + P⊥ iP0 + P⊥ iP⊥ p A A A A j h =:M1 i µ(x∗) 1 | ∗ {z ∗ ∗ } ∗ = svec ∆ϕp(λk, λl)(Pk(x )M1Pl(x )+ Pl(x )M1Pk(x )) p j k,lX=1 µ(x∗)−1 1 = svec ∆ϕ (λ , λ )P (x∗) P (x∗) p p k l k Ai l k,lX=1 µ(x∗)−1 ϕp(λk) ∗ ∗ + (P0 iPk(x )+ Pk(x ) iP0) . p A A j Xk=1 Therefore we find the estimate 1 n 2 Cϕ,−∞ M 2| | + C ′ svec( ) =: C . k 0k≤ σ ϕ ,σ k Ai k M i=1 ! X Now from (6.20) at t =0 and (6.25) we conclude n ′ ∗ ∗ ∗ ′ ⊤ ⊤ ′ (Lxx(x )+ Hp(x ,U )) xt(0,p)+ svec E0 iE0 uˆ0,t(0,p)= M0. · A i=1· − h i (6.26) CHAPTER 6. A LOCALLY CONVERGENT AUGM. LAGR. ALGORITHM 38 Evaluating (6.21) at t =0 and using Corollary 6.5 we obtain ϕ′′(0) E⊤( U ∗ + U ∗ )E x′ (0,p) pI uˆ′ (0,p)= N ,(6.27) , 0 Ai Ai 0 · t − r˜ · 0,t − 0 where N0 = N (0,p) with svec ⊤ smat N0,i,j = E0 (ej )E0 i and thus N 1. k 0k≤ Combining (6.26) and (6.27) we obtain the system ′ ′ xt(0,p) M0 Ψ(p) ′ = uˆ (0,p) − N0 0,t or equivalently ′ −1 xt(0,p) ′ M0 ′ = Ψ(p) . (6.28) uˆ0,t(0,p) − N0 Taking into account Corollary 6.9 and the estimates for M and N above, we see that k k k k max x′ (0,p) , uˆ′ (0,p) ρ(C + 1). k t k k 0,t k ≤ M r˜+1 Furthermore, for δ0 small enough and any (t,p) (t,p) R t δ0,p p0 , the inequality ∈ ∈ |k k≤ ≤ ′−1 M(x(τt,p), uˆ0(τt,p)) Ψx,uˆ (x(τt,p), uˆ0(τt,p),τt,p) 2ρ(CM +1) =: C0 N(x(τt,p), uˆ0(τt,p)) ≤ holds true for any τ [0, 1]. Also we have ∈ x(t,p) x∗ x(t,p) x(0,p) = uˆ (t,p)− u∗ uˆ (t,p) − uˆ (0,p) 0 − 0 0 − 0 t ′−1 M(x(ν,p), uˆ0(ν,p)) = Ψx,uˆ (x(ν,p), uˆ0(ν,p),ν,p) dν 0 N(x(ν,p), uˆ0(ν,p)) Z0 1 ′−1 M(x(τt,p), uˆ0(τt,p)) = Ψx,uˆ (x(τt,p), uˆ0(τt,p),τt,p) tdτ. 0 N(x(τt,p), uˆ0(τt,p)) Z0 From the latter equation we obtain x(t,p) x∗ uˆ (t,p)− u∗ 0 − 0 1 ′−1 M(x(τt,p), uˆ0(τt,p)) Ψx,uˆ (x(τt,p), uˆ0(τt,p),τt,p) t dτ ≤ 0 N(x(τt,p), uˆ0(τt,p)) k k Z0 C 0 t , ≤ k k and therefore max x(t,p) x∗ , uˆ (t,p) uˆ∗ C t . {k − k k 0 − k} ≤ 0k k CHAPTER 6. A LOCALLY CONVERGENT AUGM. LAGR. ALGORITHM 39 −1 ∗ Consequently for U = p smat(t)+ U there exists C1 > 0 independent of p such that max xˆ(U,p) x∗ , E⊤ U(U,p) U ∗ E pC U U ∗ (6.29) k − k k 0 − 0k ≤ 1k − k n ∗ o for all (U,p) (U ,p0, δ, ǫ, Θ) andb the proof of Lemma 6.11 is complete. ∈V Lemma 6.11 proves the assertion of (G2) for the primal variable and the component of the multiplier associated with the “active part” of (x∗). Based on these results, we are going to show the remaining part of estimate (6.1)A now. We start with the following inequality: U U ∗ P U U ∗ P +2 P U U ∗ P + P U U ∗ P . k − k≤k 0 − 0k k 0 − ⊥k k ⊥ − ⊥k Fromb (6.29) we concludeb b b P U U ∗ P = P U U ∗ P , P U U ∗ P k 0 − 0k 0 − 0 0 − 0 rD E b = tr P b U U ∗ P Ub U ∗ 0 − 0 − r = tr E E⊤b U U ∗ Eb E⊤ U U ∗ 0 0 − 0 0 − r = E⊤ U U ∗ b E pC U bU ∗ . 0 − 0 ≤ 1k − k Again from estimate (6.29) we obtain b max xˆ(U,p) x∗ , E⊤ U(U,p) U ∗ E δC (6.30) k − k k 0 − 0k ≤ 1 n ∗ o for all (U,p) (U ,p0, δ, ǫ, Θ). From equationb (6.30) we conclude that for given ∈ V ǫ1 > 0 we find δ > 0 small enough, such that xˆ(U,p) x∗ ǫ , (6.31) k − k ≤ 1 U (U,p) U ∗ ǫ (6.32) k 0 − 0 k ≤ 1 ∗ for all (U,p) (U ,p0, δ, ǫ,bΘ). Using these inequalities we are able to prove the following Lemma:∈ V Lemma 6.12 There exists δ > 0 small enough and a constant 0 < C2 ∗ for all i =1,...,m and all (U,p) (U ,p0, δ, ǫ, Θ). Furthermore the constant C2 does not depend on p. ∈ V Proof . From estimate (6.32)we see that we find δ > 0 small enough such that U (U,p) = E⊤ (DΦ (x(U,p))[U]) E U ∗ + ǫ . (6.33) k 0 k 0 pA 0 ≤k 0 k 1 b CHAPTER 6. A LOCALLY CONVERGENT AUGM. LAGR. ALGORITHM 40 Since U (U,p) 0 we see from (6.33) that 0 ∗ b 0 tr U0(U,p) √m ( U + ǫ1) . (6.34) ≤ ≤ k 0 k Now let b (ˆx(U,p)) = S( (ˆx(U,p)))Λ( (ˆx(U,p)))S( (ˆx(U,p)))⊤ A A A A be an eigenvalue decomposition of (ˆx(U,p)), denote by λ (ˆx) = λ ( (ˆx(U,p))) A k k A the corresponding eigenvalues and by sk(ˆx) = sk( (ˆx(U,p))) the corresponding eigenvectors. Then we conclude from inequality (6.31,A the continuity of and the A continuity of the eigenvectors that we are able to find for any given 0 <ǫ2 < 1 a δ > 0 small enough such that the following estimates hold: s (ˆx)⊤P s (ˆx) ǫ i, j =1,...,m, i = j (6.35) i 0 j ≤ 2 ∀ 6 ⊤ si(ˆx) P0si(ˆx) ǫ2 i =1,...,m r (6.36) ≤ ∀ − ⊤ si(ˆx) P0si(ˆx) 1 ǫ2 i = m r+1,...,m. (6.37) ≥ − ∀ − Due to (U,p) (U ∗,p , δ, ǫ, Θ) we know that ∈V 0 s⊤Us ǫ for all i = m r+1,...,m. i i ≥ − Consequently we find δ > 0 small enough such that the following inequalities hold: 3 s (ˆx)⊤Us (ˆx) ǫ for all i = m r+1,...,m. (6.38) i i ≥ 4 − Using the abbreviations Q(ˆx) = [∆ϕ (λ (ˆx), λ (ˆx))]m and S(ˆx)= S( (ˆx(U,p))) p k l k,l=1 A we obtain tr U (U,p) = tr E⊤ (DΦ (ˆx(U,p))[U]) E = 0 0 pA 0 tr E⊤ S(ˆx) Q(ˆx) S(ˆx)⊤US(ˆx) S(ˆx)⊤ E b0 • 0 = tr S(ˆx)⊤E E⊤S(ˆx) Q(ˆx) S(ˆx)⊤US(ˆx) 0 0 • = S(ˆx)⊤P S(ˆx) S(ˆx)⊤US(ˆx) ,Q(ˆx) . (6.39) 0 • ⊤ ⊤ Next we define Z(ˆx )= S(ˆx) P0S(ˆx) S(ˆx) US(ˆx) and subdivide the matrices Z(ˆx) and Q(ˆx) in the following way: • Z Z Q Q Z(ˆx)= 1 2 and Q(ˆx)= 1 2 , Z⊤ Z Q⊤ Q 2 3 2 3 m−r m−r,r r where Z1,Q1 S , Z2,Q2 M and Z3,Q3 S . Now from (6.39) we see that ∈ ∈ ∈ m m−r tr U (U,p) = Z ,Q + Z ,Q +2 (Z ) (Q ) . 0 h 1 1i h 3 3i 2 i,m−r+j 2 i,m−r+j j=m−r+1 i=1 X X b CHAPTER 6. A LOCALLY CONVERGENT AUGM. LAGR. ALGORITHM 41 From the positive semidefiniteness of Q(ˆx) and Z(ˆx) we conclude Z1,Q1 0, hence from (6.34) we see h i ≥ m m−r Z ,Q +2 (Z ) (Q ) √m ( U ∗ + ǫ ) . (6.40) h 3 3i 2 i,j 2 i,j ≤ k 0 k 1 j=m−r+1 i=1 X X Furthermore from the convexity of ϕ we get 0 (Q)i,j (Q)j,j for all i j, j = m r+1,...,m. Now from estimates (6.35) to≤ (6.37) and≤ the fact that ≤ − s (ˆx)⊤Us (ˆx) U Θ (6.41) | i j |≤k k≤ for all i, j =1,...,m we see that we are able to find δ > 0 small enough such that ǫ Z for all i < j, j = m r+1,...,m | i,j|≤ 6(j 1) − − and we conclude j−1 m ǫ m 2 (Z) (Q) (Q) . i,j i,j ≤ 3 k,k j=m−r+1 i=1 k=m−r+1 X X X On the other hand from estimates (6.35) to (6.38) we obtain 2 (Z ) ǫ 3 i,i ≥ 3 for δ > 0 small enough and thus m m−r m m 2ǫ ǫ Z ,Q +2 (Z ) (Q ) (Q) (Q) h 3 3i 2 i,j 2 i,j ≥ 3 k,k − 3 k,k j=m−r+1 i=1 X X k=mX−r+1 k=Xm−r+1 m ǫ = (Q) . 3 k,k k=Xm−r+1 Now we immediately obtain from (6.40) m 3√m (Q) ( U ∗ + ǫ ) , k,k ≤ ǫ k 0 k 1 k=mX−r+1 consequently 3√m (Q) ( U ∗ + ǫ ) for all k = m r+1,...,m k,k ≤ ǫ k 0 k 1 − and finally λ 3√m max (ϕ′)−1 ( U ∗ + ǫ ) . p ≤ ǫ k 0 k 1 CHAPTER 6. A LOCALLY CONVERGENT AUGM. LAGR. ALGORITHM 42 Remark . Ifwereplaceall estimatesinvolvingtheFrobeniusnormby the spectral ∗ norm in the proof of Lemma 6.12 and if we further choose ǫ close enough to λmin(U0 ) we obtain the estimate ∗ ′ λmax(U0 ) ϕp(λmax) 2 ∗ . ≤ λmin(U0 ) This observation shows that the constant C2 in Lemma 6.12 is closely related to the ∗ condition number of the matrix U0 . Next we use Taylor expansion in a neighborhood of x∗ for the calculation of the terms P U U ∗ P and P U U ∗ P . Therefore we introduce the function k 0 − ⊥k k ⊥ − ⊥k b U : Rn Sbm R Sm × + × → + (x,U,p) DΦp ( (x)) [U] , e 7→ A for which the equation U(ˆx(U,p),U,p)) = U(U,p) holds. Now let P P , P and define ∆x =x ˆ(U,p) x∗. Taylors formula ∗ 0 e⊥ b guaratees the existence∈ of { ξ S(}x∗,ǫ ) such that − ∈ 1 P U(x,U,p) U ∗ P = P (DΦ ( (x)) [U] U ∗) P ∗ − ⊥ ∗ p A − ⊥ = P (DΦ ( (x∗)) [U] U ∗) P + R(ξ)∆x, e ∗ p A − ⊥ where 2 ′ n . The following Lemma provides an R(ξ) = P∗D Φp (ξ)[U, Ai(ξ)]P⊥ i=1 upper bound for the normA of the remainder term R(ξ). Lemma 6.13 R(ξ) C , where C R is a constant independent of p. k k≤ 3 3 ∈ Proof . Let us denote the increasingly ordered eigenvalues of (x) by λ (x),i = A i 1,...,n, the corresponding Frobenius covariance matrices by Pi(x),i =1,...,n and the number of distinct eigenvalues of (x) by µ(x).Then we obtain for all i =1,...,n A P D2Φ (ξ)[U, A′ (ξ)]P = ∗ pA i ⊥ µ(ξ) 2 ⊤ ∆ (λk1 (ξ), λk2 (ξ), λk3 (ξ))P∗Pk1 (ξ) Mk1,k2,k3 (ξ)+ Mk1,k2,k3 (ξ) , k1,kX2,k3=1 h i where M (ξ)= P P (ξ)UP (ξ) ′ (ξ)P (ξ)P k1,k2,k3 ∗ k1 k2 Ai k3 ⊥ for all k1, k2, k3 = 1,...,µ(ξ) and all i = 1,...,n. Now we assume without loss of generality (see equation (6.31)) that the eigenvalues of (x) are separated in the following sense: There exists A 1 ∗ ∗ 0 < σ0 < min λi(x ) λj (x ) 2 i,j=1,...,µ(x∗) | − | i=6 j CHAPTER 6. A LOCALLY CONVERGENT AUGM. LAGR. ALGORITHM 43 such that for all j =1,...,m there exists exactly one i 1,...,µ(x∗) with ∈{ } λ (x) λ (x∗) < σ . | j − i | 0 Next we determine upper bounds for the terms 2 ∆ (λk1 (ξ), λk2 (ξ), λk3 (ξ)), k1, k2, k3 =1,...,µ(ξ). We define µ (x) = max j 1,...,µ(x) λ ( (x)) σ , assume – again 1 { ∈{ } | j A ≤− 0} without loss of generality – that λk1 (ξ) λk2 (ξ) λk3 (ξ) and distinguish three cases: ≤ ≤ k µ (ξ) for i =1, 2, 3: From the convexity of ϕ and ϕ′ follows that i ≤ 1 2 ′′ ′′ ∆ ϕ (λ (ξ), λ (ξ), λ (ξ)) ϕ (λ (ξ)) ϕ ( σ ) pC ′′ . (6.42) p k1 k2 k3 ≤ p k3 ≤ p − 0 ≤ ϕ ,−σ0 i 1, 2, 3 : k µ (ξ), j 1, 2, 3 : k > µ (ξ): Using the convexity ∃ ∈{ } i ≤ 1 ∃ ∈{ } j 1 of ϕ and Lemma 6.12 we obtain ∆(λ (ξ), λ (ξ)) ∆(λ (ξ), λ (ξ)) ∆2ϕ (λ (ξ), λ (ξ), λ (ξ)) = k3 k1 − k3 k2 p k1 k2 k3 λ (ξ) λ (ξ) k3 − k1 ′ ′ ′ ϕ (λk (ξ)) ϕ (λk (ξ)) ϕ (C ) p 3 − p 1 2 . (6.43) ≤ λ (ξ) λ (ξ) ≤ σ k3 − k1 0 ′ ki > µ1(ξ) for i =1, 2, 3: Again from the convexity of ϕ and ϕ and Lemma 6.12 we get the estimate 1 ∆2ϕ (λ (ξ), λ (ξ), λ (ξ)) ϕ′′(C )= ϕ′′(C ). (6.44) p k1 k2 k3 ≤ p 2 p 2 Now we will show that in the third case when ki > µ1(ξ) for i =1, 2, 3 the estimate P (ξ)P pCˆ k ki ⊥k≤ holds true, where Cˆ is a constant independent of p. First we obtain P (ξ)P = tr (P (ξ)P )=tr E⊤P (ξ)E k ki ⊥k ki ⊥ ⊥ ki ⊥ µ(ξ) tr E⊤P (ξ)E ≤ ⊥ k ⊥ k=µ (ξ)+1 X1 µ(ξ) ⊤ = tr E⊥ Pk(ξ) E⊥ . (6.45) k=µX1(ξ)+1 µ(ξ) Now, according to results presented in [73], the matrix P ∗ (ξ)= P (ξ) x ,0 k=µ1(ξ)+1 k has the following properties: P ∗ P ∗ (ξ) is analytic in a neighborhood of x , • x ,0 ∗ P ∗ (x )= P , • x ,0 0 CHAPTER 6. A LOCALLY CONVERGENT AUGM. LAGR. ALGORITHM 44 µ1(x) µ(x) ∂ 1 ⊤ Px∗,0(x)= Bi,k,l(x)+ Bi,k,l(x) , • ∂xi λk(x) λl(x) kX=1 l=Xµ1 +1 − where B (x)= P (x) ′ (x) P (x) for all i =1,...,n. i,k,l k Ai l ∂ ∗ From the last equation it is easy to see that ∂x Px ,0(x) is restricted by a constant ∗ C > 0 in a neighborhood of x . Using Taylor expansion again, we obtain: n ⊤ e ⊤ ⊤ ⊤ ∂ ∗ tr E P ∗ (ξ)E = tr E P E + tr E P ∗ (ξ )E (ξ x ) ⊥ x ,0 ⊥ ⊥ 0 ⊥ ⊥ ∂x x ,0 2 ⊥ − i i=1 C ∆x pCC U U ∗ pCC (Θ + U ∗ ) (6.46) ≤ k k≤ 1 k − k≤ 1 k k S ∗ for some ξ2 (x ,ǫ1). Finallye the assertione of Lemma 6.13e follows from the combi- nation of the∈ estimates (6.42) to (6.46) and the fact that the norms of U, • P (x) for k =1,...,µ(x) and • k ′ (x) for i =1,...,n • Ai are restricted in a compact neighborhood of x∗. Now, due to (6.29) and Lemma 6.13, we can find a constant C3 > 0 independent of p such that P U U ∗ P P (DΦ ( (x∗)) [U] U ∗) P + pC U U ∗ . k ∗ − ⊥k≤k ∗ p A − ⊥k 3k − k (6.47) ∗ Next, usingb U P⊥ =0 we see P (DΦ ( (x∗)) [U] U ∗) P k 0 p A − ⊥k µ(x∗)−1 pϕ(λk/p) ∗ = P0UPk(x ) λk Xk=1 C C p| ϕ,−∞| P UP = p| ϕ,−∞ | P (U U ∗)P ≤ σ k 0 ⊥k σ k 0 − ⊥k pC U U ∗ , (6.48) ≤ 4k − k |Cϕ,−∞| where C4 = m σ . Analogously we obtain P (DΦ ( (x∗)) [U] U ∗) P k ⊥ p A − ⊥k µ(x∗)−1 ∗ ∗ = ∆ϕp(λk, λl)Pk(x )UPl(x ) k,lX=1 ′ ′ ∗ pϕ (σ/2) P⊥UP⊥ = pϕ (σ/2) P⊥( U U )P⊥ ≤ k k k − k pC U U ∗ , (6.49) ≤ 5k − k CHAPTER 6. A LOCALLY CONVERGENT AUGM. LAGR. ALGORITHM 45 ′ where C5 = mpϕ (σ/2). Finally estimate (6.1) follows from the combination of es- timates (6.47) to (6.49) with estimate (6.29). We conclude the proof of assertion (G2) ′ ∗ ∗ by the following observation: From Theorem 5.1 we get Fx(ˆx(U ,p),U ,p)=0 and due to Corollary 5.5 the function F (x, U ∗,p) is strongly convex in a neighborhood of xˆ(U ∗,p). Hence xˆ(U ∗,p) = argmin F (x, U ∗,p) x Rn = x∗ { | ∈ } and it immediately follows m U(U ∗,p)= DΦ ( (x∗)) [U ∗]= ϕ′ (0)P U ∗P = U ∗. p A p 0 0 k=mX−r+1 It remainsb to verify (G3) and to complete the proof of (G1). In particular we have to show that F (x,U,p) is strongly convex in a neighborhood of xˆ(U,p) and • xˆ = argmin F (x,U,p) x Rn . • { | ∈ } Equations (6.16) and (6.17) show that xˆ from Proposition 6.10 satisfies the equation ′ ∗ Fx(ˆx,U,p)=0. In Corollary 5.5 we have shown that F (x, U ,p) is strongly convex in a neighborhood of x∗. Therefore we find γ > 0 such that y⊤F ′′ (x∗,U ∗,p)y>γ y 2 xx k k n for all y R . On the other hand, estimate (6.1) shows that for small enough p0 > 0 we have∈ xˆ(U,p) near x∗ and U(U,p) near U ∗ uniformly in (U ∗,p , δ, ǫ, Θ). V 0 Consequently we find small enough p0 > 0 and δ > 0 such that the inequality b 1 y⊤F ′′ (ˆx(U,p),U,p)y> γ y 2 xx 2 k k ∗ holds for any (U,p) (U ,p0, δ, ǫ, Θ). Hence F (x,U,p) is strongly convex in a neighborhood of xˆ(U,p∈) Vand the proof of assertion (G3) is complete. Now from (G3) follows that xˆ(U,p) is a local minimum of F (x,U,p) in a neighborhoodof xˆ(U,p) and due to (6.1) also in a neighborhood of x∗. It remains to show that the neighborhood of ∗ n x can be extended to Ωp and consequently to R . We start with the estimate: F (ˆx(U,p),U,p) F (x∗,U,p)= f(x∗)+ U, Φ ( (x∗)) ≤ h p A i = f(x∗)+ U, P Φ ( (x∗)) P + U, P Φ ( (x∗)) P h 0 p A 0i h ⊥ p A ⊥i f(x∗)+ U, P Φ ( (x∗)) P = f(x∗). (6.50) ≤ h 0 p A 0i Now suppose that there exists a constant C > 0 such that for each p0 > 0 there exists some (U,p) (U ∗,p , δ, ǫ, Θ) and some x˜ Ω such that ∈V 0 ∈ p F (˜x,U,p) F (ˆx,U,p) C. ≤ − Then from (6.50) we obtain F (˜x,U,p) f(x∗) C. (6.51) ≤ − CHAPTER 6. A LOCALLY CONVERGENT AUGM. LAGR. ALGORITHM 46 ⊤ − − ⊤ Now let Φp( (˜x)) = S(˜x)Λ(˜x)S(˜x) and define Φp( (˜x)) = S(˜x)Λ (˜x)S(˜x) , − A A where Λ (˜x) = diag(min(0, λ1(˜x)),..., min(0, λm(˜x))). Then from (6.51) we see that f(˜x) f(x∗) U, Φ ( (˜x))− C. ≤ − p A − Consequently from assumption (A5) and U < Θ we get k k C f(˜x) f(x∗) ≤ − 2 for p small enough and all p p . On the other hand it is clear that 0 ≤ 0 f(˜x) min f(x) x Ω ≥ { | ∈ p} and from the continuity of f follows that C min f(x) x Ω f(x∗) { | ∈ p}≥ − 4 for p0 small enough and thus we obtain C f(˜x) f(x∗) . ≥ − 4 This contradiction completes the proof of assertion (G1). Let us summarize our results in the following theorem: Theorem 6.14 Let (x) be twice continuously differentiable and assumptions (A1) A ∗ ∗ to (A5) hold. Let further Θ > λm(U ) and 0 <ǫ<λm−r+1(U ). Then there exist p > 0 and small enough δ > 0 such that for any (U,p) (U ∗,p , δ, ǫ, Θ) 0 ∈V 0 a) There exists a vector xˆ =x ˆ(U,p) = argmin F (x,U,p) x Rn { | ∈ } ′ such that Fx(ˆx,U,p)=0. b) For the pair xˆ and U = U(U,p)= DΦ ( (ˆx(U,p))) [U] the estimate p A maxb xˆb x∗ , U U ∗ Cp U U ∗ (6.52) k − k k − k ≤ k − k n o holds, where C is a constant independentb of p. c) xˆ(U ∗,p)= x∗ and U(U ∗,p)= U ∗. d) The function F (x,U,pb ) is strongly convex with respect to x in a neighborhood of xˆ(U,p). CHAPTER 6. A LOCALLY CONVERGENT AUGM. LAGR. ALGORITHM 47 Remark . Letus briefly discuss in whichway Theorem6.14has to be adapted, if ′ we replace assumption (ϕ4) in Definition 4.1 by the weaker assumption (ϕ4). It turns out that, while assertions a), b) and d) remain unchanged, the estimate in assertion c) becomes max xˆ x∗ , U U ∗ Cpϕ( 1/p) U U ∗ . (6.53) k − k k − k ≤ − k − k n′ o Now assumption (ϕ4) guaranteesb that we find p small enough, such that the contractive character of (6.53) can be maintained. The following Corollary is a direct consequence of Theorem 6.14 c): 1 Corollary 6.15 Let ǫ, Θ,δ and p0 be given as in Theorem 6.14. Then for p xˆ (U,p) = argmin F (x,U,p) x Rn, x x+ ν . (6.54) l { | ∈ k − k≤ } Moreover Theorem 6.14 d) guarantees the existence of η > 0 such that F (x,U,p) is strongly convex in S(x+, η) for all (U,p) (U +,p , δ, ǫ, Θ) and appropriately ∈ V 0 chosen parameters p0,δ,ǫ and Θ. Consequently any local descent method applied to the problem ′ k+1 ′ k k (i ) Find x such that Fx(x, U ,p =0 (6.55) will automatically find a solution, which satisfies the additional constraint xk+1 x+ ν, if it is started with x1 close enough to x+. Thus, if the first stepk in Al-− gorithmk ≤ 6.1.1 is replaced by step (i′) above, the resulting algorithm is guaranteed to converge to (x+,U +) with at least linear rate of convergence, provided that (U 1,p1) (U +,p ,δ,ǫ,θ), x S(x+, min η, ν ) and p1 is small enough. ∈ V 0 1 ∈ { } Remark . Whenever we solve a problem of type Algorithm 6.1.1(i) in practice, we have to replace the exact minimum by an approximation of certain precision, which can be determined by a finite procedure. In [70] this problem is analyzed for the so called nonlinear rescaling method. We have not explored this problem in the context of nonlinear semidefinite programming so far. Nevertheless we achieve results of high precision with a relatively moderate stopping criterion for the unconstrained minimiza- tion in practice (compare Section 9.3.2). Chapter 7 Globally Convergent Algorithms In the preceding section we have seen that Algorithm 6.1.1 converges provided the initial iterates are chosen appropriately. Unfortunately in many practical situations no such initial iterates are known and consequently Algorithm 6.1.1 is not applicable. A possible remedy is provided by the following hybrid strategy: (i) Use a globally convergent algorithm (which is guaranteed to converge to a so- ∗ lution of (SDP) for arbitrary initial iterates) to find (U,p) (U ,p0, δ, ǫ, Θ) C ∈ V with p< 2 . (ii) Apply Algorithm 6.1.1. In the frameworkof this section we are going to present two modified versions of Algo- rithm 6.1.1, which both turn out to be globally convergent under certain assumptions. 7.1 The Shifted Barrier Approach In our first approach, we use a constant multiplier matrix during all iterations. The ini- tial multiplier is chosen to be the identity matrix. This leads to the following algorithm, which we call shifted barrier approach below. Algorithm 7.1.1 Let x1 Rn and p1 > 0 be given. Then for k = 1, 2, 3,... repeat till a stopping criterium is∈ reached: k+1 k (i) x = arg min F (x, Im,p ) x∈Rn (ii) U k+1 = DΦ ( (xk+1))[I ] p A m (iii) pk+1 Remark . Again we are searching for a global minimum of an unconstrained opti- mization problem in step (i) of Algorithm 7.1.1. From practical point of view such an 48 CHAPTER 7. GLOBALLY CONVERGENT ALGORITHMS 49 algorithm is only recommendable for problems of moderate dimension (where global optimization techniques can be applied) or if the underlyingproblem is convex. In con- trast to the preceding section, where we discussed the local convergence properties of Algorithm 6.1.1, the global minimization is essential in the analysis presented in this section. Definition 7.1 A class of Shifted Barrier functions M : Rn R R is defined by Φ × → f(x)+ I , Φ ( (x)) = F (x, I ,p) for x Ω M (x, p)= m p Φ m p (7.1) Φ h A i for x∈ / Ω . ∞ ∈ p In order to simplify notation we omit the subscript Φ in the remainder of this section. Along with M(x, p) we define f(x) for x Ω M(x, 0) = lim M(x, p)= ∈ (7.2) p→0 for x / Ω. ∞ ∈ The latter limit can be seen from the following Lemma. Lemma 7.1 If Assumption (A5) holds, then there exists p0 > 0 such that 1 f(x) M(x, p) f(x) O pϕ ≥ ≥ − −p for all x Ω. ∈ Proof . The first inequality follows from Φp ( (x)) 0 for all x Ω. Assumption (A5) guarantees the existence of π > 0 such thatA ∈ λ ( (x)) π for all x Ω and all k =1,...,m. k A ≥− ∈ p Now the right inequality follows from the properties of ϕ. Next we introduce an additional assumption (A6) There exists p > 0 such that Ω is a compact set for all p p (7.3) 0 p ≤ 0 and formulate the following Proposition. Proposition 7.2 Let f and be twice continuously differentiable and assume that (A1) to (A4) and (A6) hold. A a) Then for any p p there exists ≤ 0 x(p) = argmin M(x, p) x Rn { | ∈ } such that ′ Mx(x(p),p)=0 and lim f(x(p)) = lim M(x(p),p)= f(x∗). p→0 p→0 CHAPTER 7. GLOBALLY CONVERGENT ALGORITHMS 50 ∗ ∗ b) The pair x(p) and U(p) = DΦp ( (x(p))) [I] converge to (x ,U ) as p con- verges to 0. A Proof . a) The existenceof x(p) for all p p follows immediately from assump- ≤ 0 tion (A6). Moreover M(x, p) is continuous in int(Ωp) and increases infinitely as x ′ approaches the boundary of Ωp, consequently Mx(x(p),p)=0. Now consider a con- vergent subsequence x(p ) x(p) and let lim =x ¯. Using (7.2) it is easy { s }⊂{ } ps→0 to see that x¯ Ω. Now, from Lemma 7.1 we see that for any ǫ > 0 we find ps small enough that ∈ f(x(p )) ǫ f(¯x) f(¯x)+trΦ ( (¯x)) + ǫ M(x(p ),p )+2ǫ s − ≤ ≤ ps A ≤ s s and we obtain: f(x(p )) M(x(p ),p )+3ǫ M(x∗,p )+3ǫ f(x∗)+3ǫ. s ≤ s s ≤ s ≤ Since ǫ was chosen arbitrarily, we get ∗ ∗ f(¯x) = lim f(x(ps)) f(x ) and thus f(¯x)= f(x ). ps→0 ≤ Consequently for any convergent subsequence x(p ) with p 0 we have { s } s → ∗ lim f(x(ps)) = f(¯x)= f(x ), ps→0 hence lim f(x(p)) = f(x∗) and therefore lim M(x(p),p)= f(x∗). p→0 p→0 b) Assumptions (A2) to (A4) guarantee that (x∗,U ∗) is a unique KKT-pair. Therefore ∗ we conclude from a) that limp→0 x(p) = x . Next we will show that limp→0 U(p)= U ∗. First we rewrite the multiplier update formula making use of the eigenvalue de- composition (x(p)) = S(x(p))diag(λ (x(p)),...,λ (x(p)))S(x(p))⊤: A 1 m U(p) = DΦ ( (x(p))) [I] p A = S(x(p)) [∆ϕ(λ (x(p)), λ (x(p)))]m S(x(p))⊤I S(x(p)) S(x(p))⊤ i j i,j=1 • m = Φ′ ( (x(p))) , p A where Φ′ : Sm Sm p → ′ ϕp (λ1(x(p))) 0 . . . 0 . 0 . (x(p)) S(x(p)) S(x(p))⊤ . A 7→ . . . .. 0 0 . . . 0 ϕ′ (λ (x(p))) p m From the definitions of x(p) and U(p) and a) we obtain n 0= M ′ (x(p),p)= f ′(x(p)) + Φ′ ( (x(p))) , (x(p)) x p A Ai i=1 CHAPTER 7. GLOBALLY CONVERGENT ALGORITHMS 51 for all p ′ ′ n 0 = lim f (x(p)) + Φp ( (x(p))) , i(x(p)) p→0 A A i=1 n ′ ∗ ′ = f (x )+ lim Φp ( (x(p))) , i p→0 A A i=1 and lim Φ′ ( (x(p))) exists. Furthermore p→0 p A m ′ ′ lim (x(p)), Φp ( (x(p))) = lim λi(x(p))ϕp(λi(x(p))) p→0 A A p→0 i=1 m X ∗ ′ = λi(x ) lim ϕp(λi(x(p))) p→0 i=1 X m−r ∗ ′ = λi(x ) lim ϕp(λi(x(p))) =0. p→0 i=1 X =0 Now from the uniqueness of the KKT-pair (x∗,U ∗) we conclude| {zlim U(p})= U ∗. p→0 Next we want to derive upper bounds for the error estimates x(p) x∗ and U(p) U ∗ . Let us therefore define ∆x = x(p) x∗, ∆U = U(p)k U ∗ −and ak functionk − k − − U : Rn R Sm × + → + (x, p) DΦp ( (x)) [I] . e 7→ A Now, using Taylor approximations in local neighborhoods of x∗ and the notation intro- duced in Chapter 5 we obtain the following formulas: n ′ (x)= + ∆x + HA(∆x), for all i =1,...,m, (7.4) A i Ai Ai,j j i j=1 X where HA : Rn Sm, HA(0) = 0 and HA = o ( ∆x ) for all i =1,...,n, i → i k i k k k n ∂ e U(x, p) = U(x∗,p)+ U(x∗,p) ∆x + HU (∆x) (7.5) ∂x j j=1 j X e e ′ ∗ e ′ ∗ ⊤ = S diag ϕp(λ1(x )),...,ϕp(λm−r(x )), 1,..., 1 S + n