OPTIMIZATION OVER SYMMETRIC CONES UNDER UNCERTAINTY

By

BAHA’ M. ALZALG

A dissertation submitted in partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

WASHINGTON STATE UNIVERSITY Department of Mathematics

DECEMBER 2011 To the Faculty of Washington State University:

The members of the Committee appointed to examine the dissertation of BAHA’ M. ALZALG find it satisfactory and recommend that it be accepted.

K. A. Ariyawansa, Professor, Chair

Robert Mifflin, Professor

David S. Watkins, Professor

ii For all the people

iii ACKNOWLEDGEMENTS

My greatest appreciation and my most sincere “Thank You!” go to my advisor Pro- fessor Ari Ariyawansa for his guidance, advice, and help during the preparation of this dissertation. I am also grateful for his offer for me to work as his research assistant and for giving me the opportunity to write papers and visit several conferences and workshops in North America and overseas. I wish also to express my appreciation and gratitude to Professor Robert Mifflin and Professor David S. Watkins for taking the time to serve as committee members and for various ways in which they helped me during all stages of doctoral studies. I want to thank all faculty, staff and graduate students in the Department of Mathe- matics at Washington State University. I would especially like to thank from the faculty Associate Professor Bala Krishnamoorthy, from the staff Kris Johnson, and from the students Pietro Paparella for their kind help. Finally, no words can express my gratitude to my parents and my grandmother for love and prayers. I also owe a special gratitude to my brothers and sisters, and some relatives in Jordan for their support and encouragement.

iv OPTIMIZATION OVER SYMMETRIC CONES UNDER UNCERTAINTY

Abstract

by Baha’ M. Alzalg, Ph.D. Washington State University December 2011

Chair: Professor K. A. Ariyawansa

We introduce and study two-stage stochastic symmetric programs (SSPs) with recourse to handle uncertainty in data defining (deterministic) symmetric programs in which a linear function is minimized over the intersection of an affine set and a symmetric cone. We present a logarithmic barrier decomposition-based interior point algorithm for solving these problems and prove its polynomial complexity. Our convergence analysis proceeds by showing that the log barrier associated with the recourse function of SSPs behaves as a strongly self-concordant barrier and forms a self-concordant family on the first stage solutions. Since our analysis applies to all symmetric cones, this algorithm extends Zhao’s results [48] for two-stage stochastic linear programs, and Mehrotra and Ozevin’s¨ results [25] for two-stage stochastic semidefinite programs (SSDPs). We also present another class of polynomial-time decomposition algorithms for SSPs based on the volumetric barrier. While this extends the work of Ariyawansa and Zhu [10] for SSDPs, our analysis is based on utilizing the advantage of the special algebraic structure associated with the symmetric cone not utilized in [10]. As a consequence, we are able to significantly simplify the proofs of central results. We then describe four applications leading to the SSP problem where, in particular, the underlying symmetric cones are second-order cones and rotated quadratic cones.

v Contents

Acknowledgement iv

Abstract v

Chapter 1

1 Introduction and Background 1 1.1 Introduction ...... 1 1.2 What is a symmetric cone? ...... 7 1.3 Symmetric cones and Euclidean Jordan algebras ...... 11

2 Stochastic Symmetric Optimization Problems 28 2.1 The stochastic symmetric optimization problem ...... 28 2.1.1 Definition of an SSP in primal standard form ...... 29 2.1.2 Definition of an SSP in dual standard form ...... 30 2.2 Problems that can be cast as SSPs ...... 31

3 A Class of Polynomial Logarithmic Barrier Decomposition Algorithms for Stochastic Symmetric Programming 42 3.1 The log barrier problem for SSPs ...... 43 3.1.1 Formulation and assumptions ...... 43

2 3.1.2 Computation of ∇xηe(µ, x) and ∇xxηe(µ, x)...... 50

vi 3.2 Self-concordance properties of the log-barrier recourse ...... 53 3.2.1 Self-concordance of the recourse function ...... 53 3.2.2 Parameters of the self-concordant family ...... 59 3.3 A class of logarithmic barrier algorithms for solving SSPs ...... 64 3.4 Complexity analysis ...... 65 3.4.1 Complexity for short-step algorithm ...... 66 3.4.2 Complexity for long-step algorithm ...... 67

4 A Class of Polynomial Volumetric Barrier Decomposition Algorithms for Stochastic Symmetric Programming 75 4.1 The volumetric barrier problem for SSPs ...... 76 4.1.1 Formulation and assumptions ...... 76 4.1.2 The volumetric barrier problem for SSPs ...... 77

2 4.1.3 Computation of ∇xη(µ, x) and ∇xxη(µ, x)...... 80 4.2 Self-concordance properties of the volumetric barrier recourse ...... 85 4.2.1 Self-Concordance of η(µ, ·)...... 86 4.2.2 Parameters of the self-concordant family ...... 93 4.3 A class of volumetric barrier algorithms for solving SSPs ...... 97 4.4 Complexity analysis ...... 98 4.4.1 Complexity for short-step algorithm ...... 100 4.4.2 Complexity for long-step algorithm ...... 101

5 Some Applications 106 5.1 Two applications of SSOCPs ...... 106 5.1.1 Stochastic Euclidean facility location problem ...... 106 5.1.2 Portfolio optimization with loss risk constraints ...... 112 5.2 Two applications of SRQCPs ...... 118 5.2.1 Optimal covering random ellipsoid problem ...... 118

vii 5.2.2 Structural optimization ...... 125

6 Related Open problems: Multi-Order Cone Programming Problems 130 6.1 Multi-order cone programming problems ...... 131 6.2 Duality ...... 134 6.3 Multi-oder cone programming problems over integers ...... 138 6.4 Multi-oder cone programming problems under uncertainty ...... 139 6.5 An application ...... 140 6.5.1 CERFLPs—An MOCP model ...... 142 6.5.2 DERFLPs—A 0-1MOCP model ...... 143 6.5.3 ERFLPs with integrality constraints—An MIMOCP model . . . . . 145 6.5.4 Stochastic CERFLPs—An SMOCP model ...... 146

7 Conclusion 150

viii List of Abbreviations

CERFLP continuous Euclidean-rectilinear facility location problem CFLP continuous facility location problem DERFLP discrete Euclidean-rectilinear facility location problem DFLP discrete facility location problem DLP deterministic linear programming DRQCP deterministic rotated quadratic cone programming DSDP deterministic semidefinite programming DSOCP deterministic second-order cone programming DSP deterministic symmetric programming EFLP Euclidean facility location problem ERFLP Euclidean-rectilinear facility location problem ESFLP Euclidean single facility location problem FLP facility location problem KKT Karush-Kuhn-Tucker conditions MFLP multiple facility location problem MIMOCP mixed integer multi-order cone programming MOCP (deterministic) multi-order cone programming 0 − 1MOCP 0-1 multi-order cone programming POCP pth − order cone programming RFLP rectilinear facility location problem SFLP stochastic facility location problem SLP stochastic linear programming SMOCP stochastic multi-order cone programming SRQCP stochastic rotated quadratic cone programming SSDP stochastic semidefinite programming

ix SSOCP stochastic second-order cone programming SSP stochastic symmetric programming Arw(x) the arrow-shaped matrix associated with the vector x Aut(K) the automorphism group of a cone K diag(·) the operator that maps its argument to a block diagonal matrix

GL(n, R) the general linear group of degree n over R int(K) the interior of a cone K E n the n dimensional real vector space whose elements are indexed from 0

n E+ the second-order cone of dimension n ˆn E+ the rotated quadratic cone of dimension n Hn the space of complex Hermitian matrices of order n

n H+ the space of complex Hermitian semidefinite matrices of order n

KJ the cone of squares of a Euclidean J

n th Qp the p -order cone of dimension n QHn the space of Hermitian matrices of order n

n QH+ the space of quaternion Hermitian semidefinite matrices of order n Sn the space of real symmetric matrices of order n

n S+ the space of real symmetric positive semidefinite matrices of order n

Rn the space of real vectors of dimension n

n n R+ the cone of nonnegative orthants of R ||x|| the Frobenius norm of an element x

||x||2 the Euclidean norm of a vector x

||x||p the p-norm of a vector x bxc the linear representation of an element x dxe the quadratic representation of an element x

x X  0 the matrix X is positive semidefinite X 0 the matrix X is positive definite x  0 the vector x lies in a second-order cone of an appropriate dimension x 0 the vector x lies in the interior of a second-order cone of an appropriate dimension x N 0 the vector x lies in the Cartesian product of N second-order cones with appropriate dimensions x ˆ 0 the vector x lies in a rotated quadratic cone of an appropriate dimension x ˆ N 0 the vector x lies in the Cartesian product of N rotated quadratic cones with appropriate dimensions

x KJ 0 x is an element of a symmetric cone KJ x KJ 0 x is an element of the interior of a symmetric cone KJ

th x hpi 0 the vector x lies in a p -order cone of an appropriate dimension

th x Nhpi 0 the vector x lies in the Cartesian product of N p -order cones with appropriate dimensions

x hp1,p2,...,pN i 0 the vector x lies in the Cartesian product of N cones of orders p1, p2,...,

pN and with appropriate dimensions x ◦ y the Jordan multiplication of elements x and y of a Jordan algebra x • y the inner product trace(x ◦ y) of elements x and y of a Euclidean Jordan algebra

xi Chapter 1

Introduction and Background

1.1 Introduction

The purpose of this dissertation is to introduce the two-stage stochastic symmetric pro- grams (SSPs)1 with recourse and to study this problem in the dual standard form:

max cTx + E [Q(x, ω)] s.t. Ax + ξ = b (1.1.1)

ξ ∈ K1, where x and ξ are the first-stage decision variables, and Q(x, ω) is the minimum of the problem max d(ω)Ty s.t. W (ω)y + ζ = h(ω) − T (ω)x (1.1.2)

ζ ∈ K2,

1Based on tradition in optimization literature, we use the term stochastic symmetric program to mean the generic form of a problem, and the term stochastic symmetric programming to mean the field of activities based on that problem. While both will be denoted by the acronym SSP, the plural of the first usage will be denoted by the acronym SSPs. Acronyms DLP, DRQCP, DSDP, DSOCP, DSP, MIMOCP, MOCP, 0-1MOCP, SLP, SMOCP, SRQCP SSDP, and SSOCP are defined and used accordance with this custom.

1 R where y and ζ are the second-stage variables, E[Q(x, ω)] := Ω Q(x, ω)P (dω), the matrix A and the vectors b and c are deterministic data, and the matrices W (ω) and T (ω) and the vectors h(ω) and d(ω) are random data whose realizations depend on an underlying outcome ω in an event space Ω with a known probability function P . The cones K1 and K2 are symmetric cones (i.e., closed, convex, pointed, self-dual cones with their automorphism

n1 n2 groups acting transitively on their interiors) in R and R . (Here, n1 and n2 are positive integers.) The birth of symmetric programming (also known as symmetric cone programming [36]) as a subfield of convex optimization can be dated back to 2003. The main motivation of this generalization was its ability to handle many important applications that cannot be covered by linear programming. In symmetric programs, we minimize a linear function over the intersection of an affine set and a so called symmetric cone. In particular, if the symmetric cone is the nonnegative orthant cone, the result is linear programming, if it is the second-order cone, the result is second-order cone programming, and if it is the semidefinite cone (the cone of all real symmetric positive semidefinite matrices), the result is semidefinite programming. It is seen that symmetric programming is a generalization of linear programming that includes second-order cone programming and semidefinite programming as special cases. We shall refer to such problems as deterministic linear programs (DLPs), deterministic second-order cone programs (DSOCPs), deterministic semidefinite programs (DSDPs) and deterministic symmetric programs (DSPs) because the data defining applications leading to such problems is assumed to be known with certainty. It has been found that DSPs are related to many application areas including finance, geometry, robust linear programming, matrix optimization, norm minimization problems, and relaxations in combinatorial optimization. We refer the reader to the survey papers by Todd [38] and Vandenberghe and Boyd [41] which discuss, in particular, DSDP and its applications, and the survey papers of Alizadeh and Goldfarb [1], and Lobo, et al. [23] which discuss, in particular, DSOCPs with a number of applications in many areas

2 including a variety of engineering applications. Deterministic optimization problems are formulated to find optimal decisions in prob- lems with certainty in data. In fact, in some applications we cannot specify the model entirely because it depends on information which is not available at the time of formu- lation but it will be determined at some point in the future. Stochastic programs have been studied since 1950s to handle those problems that involve uncertainty in data. See [17, 32, 33] and references contained therein. In particular, two-stage stochastic linear programs (SLPs) have been established to formulate many applications (see [15] for ex- ample) of linear programming with uncertain data. There are efficient algorithms (both interior and noninterior point) for solving SLPs. The class of SSP problems may be viewed as an extension of DSPs (by allowing uncertainty in data) on the one hand, and as an

extension of SLPs (where K1 and K2 are both cones of nonnegative orthants) or, more

generally, stochastic semidefinite programs (SSDPs) with recourse [9, 25] (where K1 and

K2 are both semidefinite cones) on the other hand. Interior point methods [30] are considered to be one of the most successful classes of algorithms for solving deterministic (linear and nonlinear) convex optimization problems. This provides motivation to investigate whether decomposition-based interior point algo- rithms can be developed for stochastic programming. Zhao [48] derived a decomposition algorithm for SLPs based on a logarithmic barrier and proved its polynomial complexity. Mehrotra and Ozevin¨ [25] have proved important results that extend the work of Zhao [48] to the case of SSDPs including a derivation of a polynomial logarithmic barrier de- composition algorithm for this class of problems that extends Zhao’s algorithm for SLPs. An alternative to the logarithmic barrier is the volumetric barrier of Vaidya [40] (see also [5, 6, 7]). It has been observed [8] that certain cutting plane algorithms [21] for SLPs based on the volumetric barrier perform better in practice than those based on the loga- rithmic barrier. Recently, Ariyawansa and Zhu [10] have derived a class of decomposition algorithms for SSDPs based on a volumetric barrier analogous to work of Mehrotra and

3 Ozevin¨ [25] by utilizing the work of Anstreicher [7] for DSDP. Concerning algorithms for DSPs, Schmieta and Alizadeh [35, 36] and Rangarajan [34] have extended interior point algorithms for DSDP to DSP. Concerning algorithms for SSPs, we know of no interior point algorithms for solving them that exploits the special structure of the symmetric cone as it is done in [35, 36, 34] for DSP. The question that naturally arises now is whether interior point methods could be derived for solving SSPs, and if the answer is affirmative, it is important to ask whether or not we can prove the polynomial complexity of the resulting algorithms. Our particular concern in this disser- tation is to extend decomposition-based interior point methods for SSDPs to stochastic optimization problems over all symmetric cones based on both logarithmic and volumet- ric barriers and also to prove the polynomial complexity of the resulting algorithms. In fact, there is a unifying theory [18] based on Euclidean Jordan algebras that connects all symmetric cones. This theory is very important for a sound understanding of the Jordan algebraic characterization of symmetric cones and, hence, a correct understanding of the very close equivalence between Problem (1.1.1, 1.1.2) and our constructive definition of an SSP which will be presented in Chapter 2. Now let us indicate briefly how to solve Problem (1.1.1, 1.1.2). We assume that the event space Ω is discrete and finite. In practice, the case of interest is when the random data would have a finite number of realizations, because the inputs of the modeling process require that SSPs be solved for a finite number of scenarios. The general scheme of the decomposition-based interior point algorithms is as follows. As we have an SSP with a finite scenarios, we can explicitly formulate the problem as a large scale DSP, we then use primal-dual interior point methods to solve the resulting large scale formulation directly, which can be successfully accomplished by utilizing the special structure of the underlying symmetric cones. This dissertation is organized as follows. In the remaining part of this chapter, we outline a minimal foundation of the theory of Euclidean Jordan algebras. Based on Eu-

4 clidean Jordan algebras. In Chapter 2 we explicitly introduce the constructive definition of SSP problem in both the primal and dual standard forms and then introduce six general classes of optimization problems that can be cast as SSPs. The focus of Chapter 3 is on extending the work of Mehrotra and Ozevin¨ [25] to the case of SSPs by deriving a class of logarithmic barrier decomposition algorithms for SSPs and establishing the polynomial complexity of the resulting algorithm. In Chapter 4, we extend the work of Ariyawansa and Zhu [10] to the case of SSPs by deriving a class of volumetric barrier decomposition algorithms for SSPs and proving polynomial complexity of certain members of the class of algorithms. Chapter 5 is devoted to describe four applications leading to, indeed, two important special cases of SSPs. More specifically, we describe the stochastic Euclidean facility location problem and the portfolio optimization problem with loss risk constraints as two

applications of SSPs when the symmetric cones K1 and K2 are both second-order cones, and then describe the optimal covering random ellipsoid problem and an application in

structural optimization as two applications of SSPs when the symmetric cones K1 and K2 are both rotated quadratic cones (see §2.2 for definitions). The material in Chapters 3-5 are independent, but they essentially depend on the material of Chapters 1 and 2. So, after reading Chapter 2, one can proceed directly to Chapter 3 and/or Chapter 4 concerning theoretical results (algorithms and complexity analysis), and/or to Chapter 5 concerning applicational models (see Figure 1.1). In Chapter 6, we propose the so called multi-order cone programming problem as a new conic optimization problem. This problem is beyond the scope of this dissertation because it is over non-symmetric cones, so we will leave it as an unsolved open problem. We indicate weak and strong duality relations of this optimization problem and describe an application of it. We conclude this dissertation in Chapter 7 by summarizing its contributions and indicating some possible future research directions.

5 Figure 1.1: The material has been organized into seven chapters. After reading Chapter 2, the reader can proceed directly to Chapter 3, Chapter 4, and/or Chapter 5.

We gave a concise definition for the symmetric cones immediately after Problem (1.1.1, 1.1.2). In §1.2, we write down this definition more explicitly, and in §1.3 we review the theory of Euclidean Jordan algebras that connects all symmetric cones. The text of Faraut and Kor´anyi [18] covers the foundations of this theory. In Chapter 2, the reader will see how general and abstract the problem is. In order to make our presentation concrete, we will describe two interesting examples of symmetric cones with some details throughout this chapter: the second-order cone and the cone of real symmetric positive semidefinite matrices. Our presentation in §1.2 and §1.3 mostly follows that of [18] and [36], and, in particular, most examples in §1.3 are taken from [36].

We now introduce some notations that will be used in the sequel. We use R to denote the field of real numbers. If A ⊆ Rk and B ⊆ Rl, then the Cartesian product of A × B := {(x; y): x ∈ A and y ∈ B}. We use Rm×n to denote the vector spaces of real

n×n m×n matrices. The matrices 0n,In ∈ R denote, respectively, the zero and the identity matrices of order n (we will write 0n and In as 0 and I when n is known from the context). All vectors we use are column vectors with superscript “T” indicating transposition. We use “,” for adjoining vectors and matrices in a row, and use “;” for adjoining them in a column. So, for example, if x, y, and z are vectors, we have:

6   x     T T T T  y  = (x , y , z ) = (x; y; z).     z

For each vector x ∈ Rn whose first entry is indexed with 0, we write x¯ for the subvector

n−1 n consisting of entries 1 through n − 1; therefore x = (x0; x¯) ∈ R × R . We let E denote the n dimensional real vector space R × Rn−1 whose elements x are indexed with 0, and denote the space of real symmetric matrices of order n by Sn.

1.2 What is a symmetric cone?

This section and the next section are elementary. We start with some basic definitions finally leading us to the definition of the symmetric cone.

Let V be a finite-dimensional Euclidean vector space over R with inner product h·, ·i. A subset S of V is said to be convex if it is closed with respect to convex combinations of finite subsets of S, i.e., for any λ ∈ (0, 1), x, y ∈ S implies that λx + (1 − λ)y ∈ S. A subset K ⊂ V is said to be a cone if it is closed under scalar multiplication by positive real numbers, i.e., if for any λ > 0, x ∈ K implies that λx ∈ K.A is a cone that is also a convex set. A cone is said to be closed if it is closed with respect to taking limits, solid if it has a nonempty interior, pointed if it contains no lines or, alternatively, it does not contain two opposite nonzero vectors (so the origin is an ), and regular if it is a closed, convex, pointed, solid cone.

By GL(n, R) the general linear group of degree n over R (i.e., the set of n×n invertible matrices with entries from R, together with the operation of ordinary matrix multiplica- tion). For a regular cone K ⊂ V, we denote by int(K) the interior of K, and by Aut(K) the automorphism group of K, i.e., Aut(K) := {ϕ ∈ GL(n, R): ϕ(K) = K}.

Definition 1.2.1. Let V be a finite-dimensional real Euclidean space. A regular K ⊂ V

7 is said to be homogeneous if for each u, v ∈ int(K), there exists an invertible linear map ϕ : V −→ V such that

1. ϕ(K) = K, i.e., ϕ is an automorphism of K, and

2. ϕ(u) = v.

In other words, Aut(K) acts transitively on the interior of K.

A regular K ⊂ V said to be self-dual if it coincides with its dual cone K∗, i.e.,

K = K∗ := {s ∈ V : hx, si ≥ 0, ∀x ∈ K},

and symmetric if it is both homogeneous and self-dual. Almost all conic optimization problems in real world applications are associated with symmetric cones such as the nonnegative orthant cone, the second-order cone (see below for definitions), and the cone of positive semi-definite matrices over the real or complex numbers. Except for Chapter 7 which proposes an unsolved open optimization problem over a non-symmetric cone, our focus in this dissertation is on those cones that are symmetric. The material in the rest of this section is not needed later because the theory of Jordan algebras will lead us independently to the same results. However, we include this material for the sake of clarity of the examples and completeness.

n Example 1. The second-order cone E+. We show that the second-order cone (also known as the quadratic, Lorentz, or the ice cream cone) of dimension n

n n ¯ E+ := {ξ ∈ E : ξ0 ≥ kξk2},

with the standard inner product, hξ, ζi := ξTζ, is a symmetric cone.

8 In Lemma 6.2.1, we prove that the dual cone of the pth-order cone is the qth-order cone, i.e., n ¯ ∗ n ¯ {ξ ∈ E : ξ0 ≥ kξkp} = {ξ ∈ E : ξ0 ≥ kξkq}, where 1 ≤ p ≤ ∞ and q is conjugate to p in the sense that 1/p + 1/q = 1. Picking p = 2

n ∗ n n (hence q = 2) implies that (E+) = E+. This demonstrates the self-duality of E+. So, to

n prove that the cone E+ is symmetric, we only need to show that it is homogeneous. The proof follows that of Example 1 in §2 of [18, Chapter I].

n First, notice that E+ can be redefined as

n n T E+ := {ξ ∈ E : ξ Jξ ≥ 0, ξ0 ≥ 0}, where   1 0T   J :=   . 0 −In−1

n×n T n Notice also that each element of the group G := {A ∈ R : A JA = J} maps E+ onto itself (because, for every A ∈ G, we have that (Aξ)TJ(Aξ) = ξTJξ), and so does the direct product H := R+ × G. It now remains to show that the group H acts transitively

n n on the interior of E+. To do so, it is enough show that, for any x ∈ int(E+), there exists an element in H that maps e to x, where e := (1; 0) ∈ E n. √ Note that we may write x as x = λy with λ = xTJx and y ∈ E n. Moreover, there

exists a reflector matrix Q such that y¯ = Q (0; ... ; 0; r), with r = ky¯k2. We then have

1 y2 − r2 = y2 − ky¯k2 = yTJy = xTJx = 1. 0 0 2 λ2

Therefore, there exists t ≥ 0 such that y0 = cosh t and r = sinh t.

9 We define     cosh t 0T sinh t 1 0T   ˆ   Q :=   and Ht :=  0 I 0  .    n−2  0 Q   sinh t 0T cosh t

ˆ ˆ ˆ Observing that Q, Ht ∈ G yields that QHt ∈ G, and therefore λ QHt ∈ H. The result ˆ follows by noting that x = λ QHt e.

n Example 2. The cone of real symmetric positive semidefinite matrices S+. The residents of the interior of the cone of real symmetric positive semidefinite matrices of

n order n, S+, are the positive definite matrices, the residents of the boundary of this cone are the singular positive semidefinite matrices (having at least one 0 eigenvalue), and the only matrix that resides at the origin is the positive semidefinite matrix having all 0

n eigenvalues. We now show that S+, with the Frobenius inner product hU, V i := trace(UV ), is a symmetric cone.

n We first verify that S+ is self-dual, i.e.,

n n ∗ n n S+ = (S+) = {U ∈ S : trace(UV ) ≥ 0 for all V ∈ S+}.

n n ∗ n We can easily prove that S+ ⊆ (S+) by assuming that X ∈ S+ and observing that, for

n any Y ∈ S+, we have

1 1 trace(XY ) = trace(X 2 YX 2 ) ≥ 0.

1 1 Here we used the positive definiteness of X 2 YX 2 to obtain the inequality. In fact, any

U ∈ Sn can be written as U = QΛQT where Q ∈ Rn×n is an orthogonal matrix and Λ ∈ Rn×n is a diagonal matrix whose diagonal entries are the eigenvalues of U (see for

n example Watkins [43, Theorem 5.4.20]). Therefore, if U ∈ S+ we have that

n T T X trace(U) = trace(QΛQ ) = trace(ΛQQ ) = trace(Λ) = λi ≥ 0. i=1

10 n ∗ n n n To prove that (S+) ⊆ S+, assume that X/∈ S+, then there exists a nonzero vector y ∈ R

T T n ∗ such that trace(X yy ) = y Xy < 0, which shows that X/∈ (S+) .

n So, to prove that the cone S+ is symmetric, it remains to show that it is homogeneous. The proof follows exactly that of Example 2 in §2 of [18, Chapter I].

n n For P ∈ GL(n, R), we define a linear map ϕP : S −→ S by

T ϕP (X) = PXP .

n n Note that ϕP maps S+ into itself. By the Cholesky decomposition theorem, if X ∈ int(S+)

T then X can be decomposed into a product X = LL = ϕL(In), where L ∈ GL(n, R), which establishes the result.

1.3 Symmetric cones and Euclidean Jordan algebras

In this section we give a review some of the basic definitions and theory of Euclidean Jordan algebras that are necessary for our subsequent development. We will also see how we can use Euclidean Jordan algebras to obtain symmetric cones.

Let J be a finite-dimensional vector space over R. A map ◦ : J × J −→ J is called bilinear if for all x, y ∈ J and for all α, β ∈ R, we have that (αx + βy) ◦ z = α(x ◦ z) + β(y ◦ z) and x ◦ (αy + βz) = α(x ◦ y) + β(x ◦ z).

Definition 1.3.1. A finite-dimensional vector space J over R is called an algebra over R if a bilinear map ◦ : J × J −→ J exists.

Let x be an element in an algebra J , then for n ≥ 2 we define xn recursively by xn := x ◦ xn−1.

Definition 1.3.2. Let J be a finite-dimensional R algebra with a bilinear product ◦ : J × J −→ J . Then (J , ◦) is called a Jordan algebra if for all x, y ∈ J

11 1. x ◦ y = y ◦ x (commutativity);

2. x ◦ (x2 ◦ y) = x2 ◦ (x ◦ y) (Jordan’s axiom).

The product x◦y between two elements x and y of a Jordan algebra (J , ◦) is called the Jordan multiplication between x and y. A Jordan algebra (J , ◦) has an identity element if there exists a (necessarily unique) element e ∈ J such that x◦e = e◦x = x for all x ∈ J . A Jordan algebra (J , ◦) is not necessarily associative, that is, x◦(y ◦z) = (x◦y)◦z may not hold in general. However, it is power associative, i.e., xp ◦ xq = xp+q for all integers p, q ≥ 1.

Example 3. It is easy to see that the space Rn×n of n × n real matrices with the Jordan

multiplication X ◦ Y := (XY + YX)/2 forms a Jordan algebra with identity In.

Example 4. It can be verified that the space E n with the Jordan multiplication x ◦ y :=

T n (x y; x0y¯ + y0x¯) forms a Jordan algebra with the identity (1; 0) ∈ E .

Definition 1.3.3. A Jordan algebra J is called Euclidean if there exists an inner product h·, ·i on (J , ◦) such that for all x, y, z ∈ J

1. hx, xi > 0 ∀ x 6= 0 (positive definiteness);

2. hx, yi = hy, xi (symmetry);

3. hx, y ◦ zi = hx ◦ y, zi (associativity).

That is, J admits a positive definite, symmetric, quadratic form which is also associative.

In the sequel, we consider only Euclidean Jordan algebras with identity. We simply denote the Euclidean Jordan algebra (J , ◦) by J .

Example 5. The space Rn×n is not a Euclidean Jordan algebra. However, under the operation “◦” defined in Example 3, the subspace Sn is a Jordan subalgebra of Rn×n, and,

12 indeed, is a Euclidean Jordan algebra with the inner product hX,Y i = trace(X ◦ Y ) = trace(XY ). While both the symmetry and the associativity can be easily proved by using the fact that trace(XY ) = trace(YX), the positive definiteness can be immediately obtained by observing that trace(X2) > 0 for X 6= 0.

Example 6. It is easy to verify that the space E n (with “◦” defined as in Example ( 4)) is a Euclidean Jordan algebra with the inner product hx, yi = xTy.

Many of the results below also hold for the general setting of Jordan algebras, but here we focus entirely on Euclidean Jordan algebras with identity as that generality is not needed for our subsequent development. Since a Euclidean Jordan algebra J is power associative, we can define the concepts of rank, minimal and characteristic polynomials, eigenvalues, trace, and determinant for it.

Definition 1.3.4. Let J be a Euclidean Jordan algebra. Then

1. for x ∈ J , deg(x) := min {r > 0 : {e, x, x2,..., xr} is linearly dependent} is called the degree of x;

2. rank(J ) := maxx∈J deg(x) is called the rank of J .

Let x be an element of degree d in a Euclidean Jordan algebra J . We define R[x] as the set of all polynomials in x over R:

( ∞ ) X i R[x] := aix : ai ∈ R, ai = 0 for all but a finite number of i . i=0

Since {e, x, x2,..., xd} is linearly dependent, there exists a nonzero monic polynomial q ∈ R[x] of degree d, such that q(x) = 0. In other words, there are real numbers a1(x), a2(x), . . . , ad(x), not all zero, such that

d d−1 d−2 d q(x) := x − a1(x)x + a2(x)x + ··· + (−1) ad(x)e = 0.

13 Clearly q is of minimal degree of those polynomials in R[x] which have the above

d d−1 properties, so we call it (or, alternatively, the polynomial p(λ) := λ − a1(x)λ +

d−2 d a2(x)λ + ··· + (−1) ad(x)) the minimal polynomial of x. Note that the minimal polynomial of an element x ∈ J is unique, because otherwise, as its monic, we will have a contradiction to the minimality of its degree. An element x ∈ J is called regular if deg(x) = rank(J ). If x is a regular element of a Euclidean Jordan algebra, then we define its characteristic polynomial to be equal to its minimal polynomial. We have the following proposition.

Proposition 1.3.1 ([18, Proposition II.2.1]). Let J be an algebra with rank r. The set of regular elements is open and dense in J . There exist polynomials a1, a2, . . . , ar such that the minimal polynomial of every regular element x is given by p(λ) = λr −

r−1 r−2 r a1(x)λ + a2(x)λ + ··· + (−1) ar(x). The polynomials a1, a2, . . . , ar are unique and ai is homogeneous with degree i.

The polynomial p(λ) is called the characteristic polynomial of the regular element x. Since the set of regular elements are dense in J , by continuity we can extend characteristic polynomials to all x in J . So, the minimal polynomial coincides with the characteristic polynomial for regular elements and divides the characteristic polynomial of non-regular elements.

Definition 1.3.5. Let x be an element in a rank-r algebra J , then its eigenvalues are the

r r−1 r−2 roots λ1, λ2, . . . , λr of its characteristic polynomial p(λ) = λ − a1(x)λ + a2(x)λ +

r ··· + (−1) ar(x).

Whereas the minimal polynomial has only simple roots, it is possible, in the case of non-regular elements, that the characteristic polynomial have multiple roots. Indeed, the characteristic and minimal polynomials have the same set of roots, except for their mul- tiplicities. In fact, we can define the degree of x to be the number of distinct eigenvalues of x.

14 Definition 1.3.6. Let x be an element in a rank-r algebra J , and λ1, λ2, . . . , λr be the

r r−1 r−2 r roots of its characteristic polynomial p(λ) = λ −a1(x)λ +a2(x)λ +···+(−1) ar(x). Then

1. trace(x) := λ1 + λ2 + ··· + λr = a1(x) is the trace of x in J ;

2. det(x) := λ1λ2 ··· λr = ar(x) is the determinant of x in J .

Example 7. All these concepts (characteristic polynomials, eigenvalues, trace, determi- nant, etc.) are the corresponding concepts used in Sn. Observe that rank(Sn) = n because the deg(X) is the number of distinct eigenvalues of X which is, indeed, at most n.

Example 8. We can easily prove the following quadratic identity for x ∈ E n:

2 2 2 x − 2x0x + (x0 − kx¯k2)e = 0.

2 2 2 Thus, we can define the polynomial p(λ) := λ − 2x0λ + (x0 − kx¯k2) as the characteristic

n polynomial of x in E and its two roots, λ1,2 = x0±kx¯k2, are the eigenvalues of x. We also

2 2 have that trace(x) = 2x0 and det(x) = x0 − kx¯k2. Observe also that λ1 = λ2 if and only

if x¯ = 0, and therefore x is a multiple of the identity. Thus, every x ∈ E n −{αe : α ∈ R} has degree 2. This implies that rank(E n) = 2, which is, unlike the rank of Sn, independent of the dimension of the underlying vector space.

For an element x ∈ J , let bxc : J −→ J be the linear map defined by bxcy := x ◦ y, for all y ∈ J . Note that bxce = x and bxcx = x2. Note also that the operators bxc and bx2c commute, because, by Jordan’s Axiom, bxcbx2cy = bx2cbxcy.

Example 9. An equivalent way of dealing with symmetric matrices is dealing with vectors

2 obtained from the vectorization of symmetric matrices. The operator vec : Sn −→ Rn creates a column vector from a matrix X by stacking its column vectors below one another.

15 Note that

XY + YX  1 1 vec(X◦Y ) = vec = (vec(XYI)+vec(IYX)) = (I ⊗ X + X ⊗ I) vec(Y ), 2 2 2 | {z } :=bXc where we used the fact that vec(ABC) = (CT ⊗ A)vec(B) to obtain the last equality (here, the operator ⊗ : Rm×n × Rk×l −→ Rmk×nl is the Kronecker product which maps the pair of matrices (A, B) into the matrix A ⊗ B whose (i, j) block is aijB for i = 1, 2, . . . , m and j = 1, 2, . . . , n). This gives the explicit formula of the b·c operator for Sn.

Example 10. The explicit formula of the b·c operator for E n can be immediately given by considering the following multiplication:

    xTy x x¯T    0  x ◦ y =   =   y. x0y¯ + y0x¯ x¯ x0I | {z } :=Arw(x):=bxc

Here Arw(x) ∈ Sn is the arrow-shaped matrix associated with the vector x ∈ E n.

For x, y ∈ J , let dx, ye : J × J −→ J be the quadratic operator defined by

dx, ye := bxcbyc + bycbxc − bx ◦ yc.

Therefore, in addition to bxc, we can define another linear map dxe : J −→ J associated with x that is called the quadratic representation and defined by

dxe := 2bxc2 − bx2c = dx, xe.

16 Example 11. Continuing Example 9 we have

dX,Zevec(Y ) = bXcbZcvec(Y ) + bZcbXcvec(Y ) − bX ◦ Zcvec(Y ) = vec(X ◦ (Z ◦ Y )) + vec(Z ◦ (X ◦ Y )) − vec((X ◦ Z) ◦ Y )

XZY +XYZ+ZYX+YZX  ZXY +ZYX+XYZ+YXZ  = vec 4 + vec 4 XZY +ZXY +YXZ+YZX  − vec 4 1 = 2 (vec(XYZ) + vec(ZYX)) 1 = 2 (X ⊗ Z + Z ⊗ X)vec(Y ).

1 This shows that dX,Ze := 2 (X ⊗ Z + Z ⊗ X), and, in particular, dXe := X ⊗ X.

Example 12. Continuing Example 10, we can easily verify that

  2 T kx||2 2x0x¯ dxe := 2Arw2(x) − Arw(x2) =   .  T  2x0x¯ det(x)I + 2x¯x¯

Notice that bec = dee = I, trace(e) = r, det(e) = 1 (since all the eigenvalues of e are equal to one). Notice also that the linear operator bxc is symmetric with respect to h·, ·i, because, by the associativity of the inner product h·, ·i, we have that hbxcy, zi = hx ◦ y, zi = hx, y ◦ zi = hy, bxczi. This implies that dxe is also symmetric with respect to h·, ·i. It is also easy to see that bx, y + zc = bx, yc + bx, zc, and consequently dx, y + ze = dx, ye + dx, ze. As the operator dXe in Example 11 plays an important role in the development of the interior point methods for DSDP (SSDP) and the operator in Example 12 plays an important role in the development of the interior point methods for DSOCP (SSOCP), we would expect that the operator dxe will play a similar role in the development of the interior point methods for DSP (SSP). A spectral decomposition is a decomposition of x into idempotents together with the eigenvalues. Recall that two elements c1, c2 ∈ J are said to be orthogonal if c1 ◦ c2 = 0.

17 A set of elements of J is orthogonal if all its elements are mutually orthogonal to each other. An element c ∈ J is said to be an idempotent if c2 = c. An idempotent is primitive if it is non-zero and cannot be written as a sum of two (necessarily orthogonal) non-zero idempotents.

Definition 1.3.7. Let J be a Euclidean Jordan algebra. Then a subset {c1, c2,..., cr} of J is called:

1.a complete system of orthogonal idempotents if it is an orthogonal set of idempotents

where c1 + c2 + ··· + cr = e;

2.a Jordan frame if it is a complete system of orthogonal primitive idempotents.

Example 13. Let {q1, q2,..., qn} be an orthonormal subset (all its vectors are mutually

n T T T orthogonal and all of unit norm (length)) of R . Then the set {q1q1 , q2q2 ,..., qnqn } is a Jordan frame in Sn. In fact, by the orthonormality, we have that

  0n, if i 6= j T T  (qiqi )(qjqj ) =  T qiqi , if i = j.

Pn T and that i=1 qiqi = In. n    o Example 14. Let x be a vector in E n. It is easy to see that the set 1 1; x¯ , 1 1; −x¯ 2 kx¯k2 2 kx¯k2 is a Jordan frame in E n.

Theorem 1.3.1 (Spectral decomposition (I), [18]). Let J be a Euclidean Jordan algebra

with rank r. Then for x ∈ J there exist real numbers λ1, λ2, . . . , λr and a Jordan frame

c1, c2,..., cr such that x = λ1c1 + λ2c2 + ··· + λrcr, and λ1, λ2, . . . , λr are the eigenvalues of x.

It is immediately seen that the eigenvalues of elements of Euclidean Jordan algebras are always real, which is not the case for non-Euclidean Jordan algebras.

18 Example 15. It is known that for any X ∈ Sn there exists an orthogonal matrix Q ∈ Rn×n

n×n T and a diagonal matrix Λ ∈ R such that X = QΛQ . In fact, λ1, λ2, . . . , λn; the

diagonal entries of Λ, and q1, q2,..., qn; the columns of Q, can be used to rewrite X equivalently as

T T T X = λ1 q1q1 +λ2 q2q2 + ··· + λn qnqn . |{z} |{z} | {z } C1 C2 Cn which, in view of Example 13, gives the spectral decomposition (I) of X in Sn.

Example 16. Using Example 14, the spectral decomposition (I) of x in E n can be obtained by considering the following identity:

1  x¯  1  x¯  x = (x0 + kx¯k2) 1; + (x0 − kx¯k2) 1; − . | {z } 2 kx¯k2 | {z } 2 kx¯k2 λ1 | {z } λ2 | {z } c1 c2

Theorem 1.3.2 (Spectral decomposition (II), [18]). Let J be a Euclidean Jordan algebra.

Then for x ∈ J there exist unique real numbers λ1, λ2, . . . , λk, all distinct, and a unique complete system of orthogonal idempotents c1, c2,..., ck such that x = λ1c1 +λ2c2 +···+

λkck.

Continuing Examples 15 and 16, we have the following two examples [36].

n Example 17. To write the spectral decomposition (II) of X in S , let λ1 > λ2 > ··· > λk be distinct eigenvalues of X such that, for each i = 1, 2, . . . , k, the eigenvalue λi has

multiplicity mi and orthogonal eigenvectors qi1, qi2,..., qimi . Then X can be written as

m1 m2 mk X T X T X T X = λ1 q1jq1j +λ2 q2jq2j + ··· + λk qkjqkj . j=1 j=1 j=1 | {z } | {z } | {z } :=C1 :=C2 :=Ck

Pk where the set {C1,C2,...,Ck} is an orthogonal system of idempotents and i=1 Ci = In.

Notice that for each eigenvalue λi, the matrix Ci is uniquely written, even though the

19 corresponding eigenvectors qi1, qi2,..., qimi may not be unique.

n Example 18. By Example 16, the eigenvalues of x ∈ E are λ1,2 = x0 ± kx¯k2, and therefore, as mentioned earlier, only multiples of identity have multiples eigenvalues. In fact, if x = αe, for some α ∈ R, then its spectral decomposition (II) is simply αe (here {e} is the singleton system of orthonormal idempotents).

Definition 1.3.8. Let J be a Euclidean Jordan algebra. We say that x ∈ J is positive semidefinite (positive definite) if all its eigenvalues are nonnegative (positive).

Proposition 1.3.2 ([18, Proposition III.2.2]). If the elements x and y are positive defi- nite, then the element bxcy is so.

Definition 1.3.9. We say that two elements x and y of a Euclidean Jordan algebra are simultaneously decomposed if there is a Jordan frame {c1, c2,..., cr} such that x = Pr Pr i=1 λici and y = i=1 µici.

For x ∈ J , it is now possible to rewrite the definition of x2 as

2 2 2 2 x := λ1c1 + λ2c2 + ··· + λkck = x ◦ x.

We also have the following definition.

Definition 1.3.10. Let x be an element of a Euclidean Jordan algebra J with a spectral decomposition x := λ1c1 + λ2c2 + ··· + λkck. Then

1/2 1/2 1/2 1/2 1. the square root of x is x := λ1 c1 + λ2 c2 + ··· + λk ck, whenever all λi ≥ 0, and undefined otherwise;

−1 −1 −1 −1 2. the inverse of x is x := λ1 c1 + λ2 c2 + ··· + λk ck, whenever all λi 6= 0, and undefined otherwise.

20 More generally, if f is any real valued continuous function, then it is also possible to extend the above definition to define f(x) as

f(x) := f(λ1)c1 + f(λ2)c2 + ··· + f(λk)ck.

Observe that x−1 ◦ x = e. We call x invertible if x−1 is defined, and non-invertible or singular otherwise. Note that every positive definite element is invertible and its inverse is also positive definite.

Remark 1. The equality x ◦ y = e may not imply that y = x−1, as it can be seen in the following equality [18, Chapter II]:

      1 0 1 1 1 0         ◦   =   . 0 −1 1 −1 0 1 | {z } | {z } | {z } =X=X−1 =Y 6=X−1 =I

We define the differential operator Dx : J −→ J by

 d   d   d  Dxf(x) := f(λ1) c1 + f(λ2) c2 + ··· + f(λk) ck. dλ1 dλ2 dλk

T The Jacobian matrix ∇xf(x) is defined so that (∇xf(x)) y = (Dxf(x)) ◦ y for all

n n n−1 y ∈ J . For n ≥ 2, we define Dxf(x) recursively by Dxf(x) := Dx(Dx f(x)). For

2 −1 −2 instance, Dxx = Dx(Dxx) = Dxe = 0, Dxx = −x , provided that x is invertible. More generally, if y is a function of x and they are both simultaneously decomposed,

−1 −2 then Dxf(y) := Dyf(y) ◦ Dxy. For example, Dxy = −y ◦ Dxy, provided that y is invertible. Note that, if y and z are functions of x and they are all simultaneously

decomposed, then Dx(y ± z) = Dxy ± Dxz, and Dx(y ◦ z) = (Dxy) ◦ z + y ◦ (Dxz). This differential operator is interesting and will play an important role when computing partial derivatives.

21 There is a one-to-one correspondence between Euclidean Jordan algebras and sym- metric cones.

Definition 1.3.11. If J is a Euclidean Jordan algebra then its cone of squares is the set

2 KJ := {x : x ∈ J }.

Such a one to one correspondence between (cones of squares of) Euclidean Jordan algebras and symmetric cones is given by the following fundamental result, which says that a cone is symmetric if and only if it is the cone of squares of some Euclidean Jordan algebra.

Theorem 1.3.3 (Jordan algebraic characterization of symmetric cones, [18]). A regular

cone K is symmetric iff K = KJ for some Euclidean Jordan algebra J .

The above result implies that an element is positive semidefinite if and only if it belongs to the cone of squares, and it is positive definite if and only if it belongs to the interior of the cone of squares. In other words, an element x in a Euclidean Jordan algebra

J is positive semidefinite if and only if x ∈ KJ , and is positive definite if and only if

x ∈ int(KJ ), where int(KJ ) denotes the interior of the cone KJ . The following notations will be used throughout the dissertation. For a Euclidean

Jordan algebra J , we write x KJ 0 and x KJ 0 to mean that x ∈ KJ and x ∈ int(KJ ),

respectively. We also write x KJ y (or y KJ x) and x KJ y (or y ≺KJ x) to mean that x − y KJ 0 and x − y KJ 0, respectively.

n Example 19. The cone KSn is, indeed, S+, the cone of real symmetric positive semidef- inite matrices of order n. This can be seen in view of the fact that a symmetric matrix is square of another symmetric matrix if and only if it is a positive semidefinite matrix. To prove this fact, suppose that X is positive semidefinite, then it has nonnegative eigenvalues

22 λ1, λ2, . . . , λn and, by the for real symmetric matrices, there exists an or-

n×n 2 n×n thogonal matrix Q ∈ R and a diagonal matrix Λ := diag (λ1; λ2; ... ; λn) ∈ R such √ √ √ T 1/2 T n 1/2 that X = QΛQ . By letting Y := QΛ Q ∈ S where Λ := diag( λ1; λ2; ... ; λn), it follows that

X = QΛQT = QΛ1/2Λ1/2QT = (QΛ1/2QT)(QΛ1/2QT) = Y 2.

To prove the other direction, let us assume that X = Y 2 for some Y ∈ Sn. It is clear that X ∈ Sn. To show that X is positive semidefinite, let (λ, v) be an eigenpair of Y , then λ ∈ R (every real symmetric matrix has real eigenvalues). Furthermore, we have that Xv = Y 2v = Y (λv) = λ2v. Thus, (λ2, v) is an eigenpair of X, which means that X has always nonnegative eigenvalues, and therefore this completes the proof.

Example 20. The cone of squares of E n is the second-order cone of dimension n:

n n ¯ E+ := {ξ ∈ E : ξ0 ≥ kξk2}.

To see this, recall that the cone of squares (with with respect to “◦” defined in Example 4) is 2 n 2 ¯ n KEn = {ζ : ζ ∈ E } = {(kζ||2; 2ζ0ζ): ζ ∈ E }.

2 n Thus, any x ∈ KEn can be written as x = (ky||2; 2y0y¯), for some y ∈ E . It follows that

2 2 2 x0 = ky||2 = y0 + ky¯k2 ≥ 2y0y¯ = x¯,

2 n where the inequality follows by observing that (y0 − ky¯k2) ≥ 0. This means that x ∈ E+

n and hence KEn ⊆ E+. The proof of the other direction can be found in §4 of [1].

2We indicate by the operator diag(·) that one maps its argument to a block diagonal matrix; for example, if x ∈ Rn, then diag(x) is the n × n diagonal matrix with the entries of x on the diagonal.

23 Theorem 1.3.4 ([18, Theorem III.1.5]). Let J be a Jordan algebra over R with the identity element e. The following properties are equivalent:

1. J is a Euclidean Jordan algebra.

2. The symmetric bilinear form trace(x ◦ y) is positive definite.

A direct consequence of the above theorem is that if J is a Euclidean Jordan algebra, then x • y := trace(x ◦ y) is an inner product. In the sequel, we define the inner product as hx, yi := x • y = trace(x ◦ y) and we call it the Frobenius inner product. It is easy to see that, for x, y, z ∈ J , x • e = trace(x), x • y = y • x,(x + y) • z = x • z + y • z, x • (y + z) = x • z + x • z, and (x ◦ y) • z = x • (y ◦ z).

For x ∈ J , the Frobenius norm (denoted by k·kF , or simply k·k) is defined as kx|| := √ x • x. We can also define various norms on J as functions of eigenvalues. For example,

p 2 2 2 the definition of the Frobenius norm can be rewritten as kx|| := λ1 + λ2 + ··· + λk = √ ptrace(x2) = x • x. The Cauchy-Schwartz inequality holds for the Frobenius inner product, i.e., for x, y ∈ J , |x • y| ≤ kx|| ky|| (see for example [34]). Note that ke|| = √ √ e • e = r.

Let x ∈ J and t ∈ R. For a function g := g(x, t) from J × R into R, we will use “g0”

2 3 for the partial derivative of g with respect to t, and “∇xg”, “∇xxg”, “∇xxxg” to denote the gradient, Hessian, and the third order partial derivative of g with respect to x. For a

function y := y(x, t) from J × R into J , we will also use “y0” for the partial derivative

of y with respect to t,“Dxy” to denote the first partial derivative of y with respect to

x, and “∇xy” to denote the Jacobian matrix of y (with respect to x).

Let h1, h2,..., hk ∈ J . For a function f from J into R, we write

k k ∂ f(x + t1h1 + ··· + tkhk) ∇x...xf(x)[h1, h2,..., hk] := |t1=···=tk=0 ∂t1 ··· ∂tk

th to denote the value of the k differential of f taken at x along the directions h1, h2,..., hk.

24 We now present some handy tools that will help with our computations.

Lemma 1.3.1. Let J be a Euclidean Jordan algebra with identity e, and x, y, z ∈ J . Then

0 0 −1 1. {ln det (e + tx)} |t=0 = trace(x) and, more generally, {ln det (y + tx)} |t=0 = y • x, provided that det (e + tx) and det (y + tx) are positive.

−1 0 −1 0 2. {trace(e + tx) } |t=0 = −trace(x) and, more generally, {trace(e+tx◦y) } |t=0 = −x • y, provided that e + tx is invertible.

−1 3. ∇x ln det x = x , provided that det x is positive (so x is invertible). More gener-

T −1 ally, if y is a function of x, then ∇x ln det y = (∇xy) y , provided that det y is positive.

−1 −1 2 −1 4. ∇xx = − dx e provided that x is invertible, and hence ∇xx ln det x = ∇xx =

−1 −1 −1 −dx e. More generally, if y is a function of x, then ∇xy = − dy e ∇xy provided that y is invertible.

5. ∇xdxe[y] = 2dx, ye.

6. If x and y are functions of t where t ∈ R, then (x ◦ y)0 = x ◦ y0 + x0 ◦ y. In other words, (bxcy)0 = bx0cy + bxcy0. Therefore, bxc0 = bx0c, dx, ye0 = dx, y0e + dx0, ye, and, in particular, dxe0 = 2dx, x0e.

7. ∇x trace(x) = e = Dxx and, more generally, if y is a function of x and they are

T both simultaneously decomposed, then ∇xtrace(y) = (∇xy) e = Dxy. Hence, if y and z are functions of x and they are all simultaneously decomposed, then

T T ∇x(y • z) = Dx(y ◦ z) = (Dxy) ◦ z + (Dxz) ◦ y = (∇xy) z + (∇xz) y.

25 The proofs of most of these statements are straightforward. We only indicate that item

0 0 1 follows from the facts that {det (e + tx)} |t=0 = trace(x) and that {det (y + tx)} |t=0 = det(y)(y−1 •x) [18, Proposition III.4.2], the first statement in item 2 follows from spectral decomposition (I), the first statement in item 3 is taken from [18, Proposition III.4.2], the first statement in item 4 is taken from [18, Proposition II.3.3], item 5 is taken from §3 of [18, Chapter II], the first statement in item 6 is taken from §4 of [18, Chapter II], and the first statement in item 7 is obtained by using item 3 and the observation that det(exp(x)) = exp(trace(x)).

Definition 1.3.12. We say two elements x, y of a Euclidean Jordan algebra J operator commute if bxcbyc = bycbxc. In other words, x and y operator commute if for all z ∈ J , we have that x ◦ (y ◦ z) = y ◦ (x ◦ z).

We remind the reader about the fact that two matrices X,Y ∈ Sn commute if and only if XY is symmetric, if and only if X and Y can be simultaneously diagonalized, i.e. they share a common system of orthonormal eigenvectors (the same Q). This fact can be generalized in the following theorem which can be also applied to multiple-block elements.

Theorem 1.3.5 ([36, Theorem 27]). Two elements of a Euclidean Jordan algebra operator commute if and only if they are simultaneously decomposed.

Note that if two operator commutative elements x and y are invertible, then so is x ◦ y. Moreover, (x ◦ y)−1 = x−1 ◦ y−1, and det(x ◦ y) = det(x) det(y) (see also [18, Proposition II.2.2]). In Remark 1 we mentioned that the equality x ◦ y = e may not imply that y = x−1. However, the equality x ◦ y = e does imply that y = x−1 when the elements x and y operator commute.

Lemma 1.3.2 (Properties of d·e). Let x and y be elements of a Euclidean Jordan algebra with rank r and dimension n, x invertible, and k be an integer. Then

1. det(dxey) = det2(x) det(y).

26 2. dxex−1 = x, dxee = x2.

3. ddyexe = dyedxedye.

4. dxke = dxek.

The first three items of the following lemma are taken from [18, Chapters II and III] and the last one is taken from [36, §2]. We use “,” for adjoining elements of a Euclidean Jordan algebra J in a row, and use

“;” for adjoining them in a column. Thus, if J is a Euclidean Jordan algebra, and xi ∈ J for i = 1, 2, . . . , m, we have

  x1      x2    (x1; x2; ... ; xm) :=  .  ∈ J × J × · · · × J ,  .  | {z }   m times   xm

T h i and we write (x1; x2; ... ; xm) := (x1, x2,..., xm) := x1 x2 ··· xm . As we mentioned

earlier, we also use the superscript “T” to indicate transposition of column vectors in Rn. We end this section with the following lemma which is essentially a part of Lemma 12 in [36].

Lemma 1.3.3. Let x ∈ J with a spectral decomposition x = λ1c1 + λ2c2 + ··· + λrcr. Then the following statements hold.

1. The matrices bxc and dxe commute and thus share a common system of eigenvec- tors.

2. Every eigenvalue of bxc have the form (1/2)(λi +λj) for some i, j ≤ r. In particular,

x KJ 0 (x KJ 0) if and only if bxc  0 (bxc 0). The eigenvalues of x; λi’s, are amongst the eigenvalues of bxc.

27 Chapter 2

Stochastic Symmetric Optimization Problems

In this chapter, we use the Jordan algebraic characterization of symmetric cones to define the SSP problem in both the primal and dual standard forms. We then see how this problem can include some general optimization problems as special cases.

2.1 The stochastic symmetric optimization problem

We define a problem based on the DSP problem analogous to the way SLP problem is defined based on the DLP problem. To do so, we first introduce the definition of a DSP. Let J be a Euclidean Jordan algebra with dimension n and rank r. The DSP problem and its dual [36] are

min c • x max bTy m X (P1) s.t. ai • x = bi, i = 1, 2, . . . , m (D1) s.t. yiai KJ c i=1 m x KJ 0; y ∈ R ,

28 m where c, ai ∈ J for i = 1, 2, . . . , m, b ∈ R , x is the primal variable, y is the dual variable, and, as mentioned in §2.1, KJ is the cone of squares of J .

The pair (P1,D1) can be rewritten in the following compact form

min c • x max bTy

T (P2) s.t. Ax = b (D2) s.t. A y KJ c

m x KJ 0; y ∈ R ,

m T where A := (a1; a2; ... ; am) is a linear operator that maps J into R and A is a linear operator that maps Rm into J such that x ◦ ATy = (Ax)Ty. In fact, we can prove weak and strong duality properties for the pair (P2,D2) as justification for referring to them as a primal dual pair; see, for example, [30].

In the rest of this section, we assume that m1, m2, n1, n2, r1, and r2 are positive integers, and J1 and J2 are Euclidean Jordan algebras with identities e1 and e2, dimensions n1 and n2, and ranks r1 and r2, respectively.

2.1.1 Definition of an SSP in primal standard form

We define the primal form of the SSP based on the primal form of the DSP. Given ai, tj ∈

J1 and wj ∈ J2 for i = 1, 2, . . . , m1 and j = 1, 2, . . . , m2. Let A := (a1; a2; ... ; am1 )

th be a linear operator that maps x to the m1-dimensional vector whose i component is

m1 ai • x, b ∈ R , c ∈ J1, T := (t1; t2; ... ; tm2 ) be a linear operator that maps x to

th the m2-dimensional vector whose i component is ti • x, W := (w1; w2; ... ; wm2 ) be a

th linear operator that maps y to the m2-dimensional vector whose i component is wi • y,

m2 h ∈ R , and d ∈ J2. We also assume that A, b and c are deterministic data, and T, W, h and d are random data whose realizations depend on an underlying outcome ω in an event space Ω with a known probability function P . Given this data, an SSP with recourse in

29 primal standard form is min c • x + E [Q(x, ω)] s.t. Ax = b (2.1.1) x  0, KJ1 where Q(x, ω) is the minimum value of the problem

min d(ω) • y s.t. W (ω)y = h(ω) − T (ω)x (2.1.2) y  0, KJ2 where x is the first-stage decision variable, y is the second-stage variable, and

Z E[Q(x, ω)] := Q(x, ω)P (dω). Ω

2.1.2 Definition of an SSP in dual standard form

In many applications we find that a defined SSP problem based on the dual standard form

(D2) is more useful. In this part we define the dual form of the SSP based on the dual form of the DSP. Given ai ∈ J1 and ti, wj ∈ J2 for i = 1, 2, . . . , m1 and j = 1, 2, . . . , m2. Let

m1 A := (a1, a2,..., am1 ), b ∈ J1, c ∈ R , T := (t1, t2,..., tm1 ), W := (w1, w2,..., wm2 ),

m2 h ∈ J2, and d ∈ R . We also assume that A, b and c are deterministic data, and T, W, h and d are random data whose realizations depend on an underlying outcome ω in an event space Ω with a known probability function P . Given this data, an SSP with recourse in dual standard form is max cTx + E [Q(x, ω)] (2.1.3) s.t. Ax  b, KJ1 where Q(x, ω) is the maximum value of the problem

30 max d(ω)Ty (2.1.4) s.t. W (ω)y  h(ω) − T (ω)x, KJ2 where x ∈ Rm1 is the first-stage decision variable, y ∈ Rm2 is the second-stage variable, and Z E[Q(x, ω)] := Q(x, ω)P (dω). Ω

In fact, it is also possible to define SSPs in“mixed forms” where the first-stage is based on primal problem (P2) while the second-stage is based on the dual problem (D2), and vice versa.

2.2 Problems that can be cast as SSPs

It is interesting to mention that almost all conic optimization problems in real world applications are associated with symmetric cones, and that, as an illustration of the modeling power of conic programming, all deterministic convex programming problems can be formulated as deterministic conic programs (see [29]). As a consequence, by considering the stochastic counterpart of this result, it is straightforward to show that all stochastic convex programming problems can be formulated as stochastic conic programs. In this section we will see how SSPs can include some general optimization problems as special cases. We start with two-stage stochastic linear programs with recourse.

Problem 1. Stochastic linear programs:

It is clear that the space of real numbers R with Jordan multiplication x ◦ y := xy and inner product x • y := xy forms a Euclidean Jordan algebra. Since a real number is a square of another real number if and only if it is nonnegative, the cone of squares of R is indeed R+; the set of all nonnegative real numbers. This verifies that R+ is a symmeric

p p cone. The cone R+ of nonnegative orthants of R is also symmetric because it is just

31 p Euclidean Jordan algebra R = {(x1; x2; ... ; xp): xi ∈ R, i = 1, 2, . . . , p} p p Symmetric cone R+ = {x ∈ R : xi ≥ 0, i = 1, 2, . . . , p} Conic inequality x  p 0 ≡ x ≥ 0 and x p 0 ≡ x > 0 R+ R+ Jordan multiplication x ◦ y = (x1y1; x2y2; ··· ; xpyp) = diag(x) y | {z } bxc T Inner product x • y = x1y1 + x2y2 + ··· + xpyp = x y Identity element e = (1; 1; ... ; 1) Spectral decomposition x = x1 (1; 0; ... ; 0) + x2 (0; 1; ... ; 0) + |{z} | {z } |{z} | {z } λ1 c1 λ2 c2 ··· + xp (0; 0; ... ; 1) |{z} | {z } λp cp p Cone rank rank(R+) = p Expression for trace trace(x) = x1 + x2 + ··· + xp Expression for determinant det(x) = x1 x2 ··· xp q 2 2 2 Expression for Frobenius norm ||x|| = x1 + x2 + ··· + xp = ||x||2 −1 1 1 1 Expression for inverse x = ( ; ; ... ; ) (if xi 6= 0 for all i) x1 x2 xp 1/2 √ √ √ Expression for square root x = ( x1; x2; ... ; xp) (if xi ≥ 0 for all i) Pp Log barrier function − ln det(x) = − i=1 ln xi if xi > 0 for all i Table 2.1: The Euclidean Jordan algebraic structure of the nonnegative orthant cone.

p time z }| { the Cartesian product of the symmetric cones R+, R+,..., R+. Table 2.1 summarizes the p Euclidean Jordan algebraic structure of the nonnegative orthant cone R+.

n1 n2 When J1 = R and J2 = R ; the spaces of vectors of dimensions n1 and n2, re- spectively, with Jordan multiplication x ◦ y := diag(x)y and the standard inner product

T n1 n2 x • y := x y, then KJ1 = R+ and KJ2 = R+ ; the cones of nonnegative orthants of Rn1 and Rn2 , respectively, and hence the SSP problem (2.1.1, 2.1.2) becomes the SLP problem: min cTx + E [Q(x, ω)] s.t. Ax = b x ≥ 0,

32 where Q(x, ω) is the minimum value of the problem

min d(ω)Ty s.t. W (ω)y = h(ω) − T (ω)x y ≥ 0.

Two-stage SLP with recourse has many practical applications, see for example [15].

Problem 2. Stochastic second-order cone programs:

n1 n2 T When J1 = E and J2 = E with Jordan multiplication x ◦ y := (x y, x0y¯ + y0x¯),

T n1 n2 and inner product x • y := x y, then KJ1 = E+ and KJ2 = E+ ; the second-order cones of dimensions n1 and n2, respectively (see Example 20), and hence we obtain SSOCP with recourse. Table 2.21 summarizes the Euclidean Jordan algebraic structure of the second-order cone in both single- and multiple-block forms. To introduce the SSOCP problem, we first introduce some notations that will be used throughout this part and in Chapter 5.

For simplicity sake, we write the single-block second-order cone inequality x  p 0 E+ p as x  0 (to mean that x ∈ E+) when p is known from the context, and the multiple- block second-order cone inequality x  p1 p2 pN 0 as x N 0 (to mean that x ∈ E+ ×E+ ×···×E+

p1 p2 pN E+ × E+ × · · · × E+ ) when p1, p2, . . . , pN are known from the context. It is immediately

p PN seen that, for every vector x ∈ R where p = i=1 pi, x N 0 if and only if x is partitioned conformally as x = (x1; x2; ... ; xN ) and xi  0 for i = 1, 2,...,N. We also write x  y (x N y) or y  x (y N x) to mean that x − y  0 (x − y N 0). We now are ready to introduce the definition of an SSOCP with recourse.

Let N1,N2 ≥ 1 be integers. For i = 1, 2,...,N1 and j = 1, 2,...,N2, let m1, m2, n1, n2,

PN1 PN2 n1i, n2j be positive integers such that n1 = i=1 n1i and n2 = i=1 n2j. An SSOCP with recourse in primal standard form is defined based on deterministic data A ∈ Rm1×n1 , b ∈  A 0  1The direct sum of two square matrices A and B is the block diagonal matrix A ⊕ B := . 0 B

33 p p−1 p p Euclidean Jordan alg. J E = {x : x = (x0; x¯) ∈ R × R } E 1 × · · · × E N p p p1 pN Symmetric cone KJ E+ = {x ∈ E : x0 ≥ ||x¯||2} E+ × · · · × E+

x KJ 0 (x KJ 0) x  0 (x 0) x N 0 (x N 0) T Jordan product x ◦ y (x y; x0y¯ + y0x¯) = Arw(x)y (x1 ◦ y1; ... ; xN ◦ yN ) T T T Inner product x • y x y x1 y1 + ··· + xN yN The identity e (1; 0) (e1; ... ; eN ) The matrix bxc Arw(x) Arw(x1) ⊕ · · · ⊕ Arw(xN ) 1  x¯  Spectral decomposition x = (x0 + ||x¯||2) 1; Follows from decomp. of | {z } 2 ||x¯||2 λ1 | {z } c1 1  x¯  + (x0 − ||x¯||2) 1; − each block xi, 1 ≤ i ≤ N | {z } 2 ||x¯||2 λ2 | {z } c2 rank(KJ ) 2 2N PN Expression for trace(x) λ1 + λ2 = 2x0 i=1 trace(xi) Expression for det(x) λ λ = x2 − ||x¯||2 ΠN det(x ) 1 2 0 √ i=1 i p 2 2 PN 2 Frobenius norm ||x|| λ1 + λ2 = 2||x||2 i=1 ||xi|| −1 −1 −1 Jx −1 −1 −1 Inverse x λ1 c1 + λ2 c2 = det(x) (x1 ; ... ; xN ) (if xi exists (if det(x) 6= 0; o/w x is singular) for all i; o/w x is singular) 2 2 PN Log barrier function − ln (x0 − ||x¯||2) − i=1 ln det xi Table 2.2: The Euclidean Jordan algebraic structure of the second-order cone in single- and multiple-block forms.

Rm1 and c ∈ Rn1 and random data T ∈ Rm2×n1 ,W ∈ Rm2×n2 , h ∈ Rm2 and d ∈ Rn2 whose realizations depend on an underlying outcome ω in an event space Ω with a known probability function P . Given this data, the two-stage SSOCP in the primal standard form is the problem min cTx + E [Q(x, ω)] s.t. Ax = b (2.2.1)

x N1 0, where x ∈ Rn1 is the first-stage decision variable and Q(x, ω) is the minimum value of the problem min d(ω)Ty s.t. W (ω)y = h(ω) − T (w)x (2.2.2)

y N2 0,

34 where y ∈ Rn2 is the second-stage variable and

Z E[Q(x, ω)] := Q(x, ω)P (dω). Ω

Note that if N1 = n1 and N2 = n2, then SSOCP (2.2.1, 2.2.2) reduces to SLP. In fact,

1 if N1 = n1, then ni = 1 for each i = 1, 2,...,N1, and so xi ∈ E+ := {t ∈ R : t ≥ 0} for each i = 1, 2,...,N1. Thus, the constraint x N1 0 means the same as x ≥ 0, i.e., x lies

n1 in the nonnegative orthant of R . The same situation occurs for y N2 0. Thus SLPs is a special case of SSOCP (2.2.1, 2.2.2). Stochastic quadratic programs (SQPs) are also a special case of SSOCPs. To demon- strate this, recall that a two-stage SMIQP with recourse is defined based on deterministic

n1 n1 m1×n1 m1 n2 n2 data C ∈ S+ , c ∈ R ,A ∈ R and b ∈ R ; and random data H ∈ S+ , d ∈ R ,T ∈ Rm2×n1 ,W ∈ Rm2×n2 , and h ∈ Rm2 whose realizations depend on an underlying outcome in an event space Ω with a known probability function P . Given this data, an SQP with recourse is T T min q1(x, ω) = x Cx + c x + E[Q(x, ω)] s.t. Ax = b (2.2.3) x ≥ 0, where x ∈ Rn1 is the first-stage decision variable and Q(x, ω) is the minimum value of the problem T T min q2(y, ω) = y H(ω)y + d(ω) y s.t. W (ω)y = h(ω) − T (ω)x (2.2.4) y ≥ 0, where y ∈ Rn2 is the second-stage decision variable and

Z E[Q(x, ω)] := Q(x, ω)P (dω). Ω

35 Observe that the objective function of (2.2.3) can be written as (see [1])

1 1 1 1 q (x , ω) = ||u¯||2 + [Q(x, ω)] − cTC−1c where u¯ = C /2 x + C− /2 c. 1 1 E 4 2

Similarly, the objective function of (2.2.4) can be written as

1 1 1 1 q (y, ω) = ||v¯||2 − d(ω)TH(ω)−1d (ω) where v¯ = H(ω) /2 y + H(ω)− /2 d(ω). 2 4 2

Thus, problem (2.2.3, 2.2.4) can be transformed into the SSOCP:

min u0

1 1 s.t. u¯ − C /2 x = 1 C− /2 c 2 (2.2.5) Ax = b u  0, x ≥ 0,

where Q(x, ω) is the minimum value of the problem

min v0

1 1 s.t. v¯ − H(ω) /2 y = 1 H(ω)− /2 d(ω) 2 (2.2.6) W (ω)y = h(ω) − T (ω)x v  0, y ≥ 0,

where Z E[Q(x, ω)] := Q(x, ω)P (dω). Ω

Note that the SQP problem (2.2.3, 2.2.4) and the SSOCP problem (2.2.5, 2.2.6) will have the same minimizing solutions, but their optimal objective values are equal up to constants. More precisely, the difference between the optimal objective values of (2.2.4,

1 T −1 2.2.6) would be − 2 d(ω) H(ω) d(ω). Similarly, the optimal objective values of (2.2.3,

36 2.2.4) and (2.2.5, 2.2.6) will differ by

1 1 Z − cTC−1c − d(ω)T H(ω)−1d(ω) P (dω). 2 2 Ω

In §5.1 of Chapter 5, we describe two applications that illustrate the applicability of two-stage SSOCPs with recourse.

Problem 3. Stochastic rotated quadratic cone programs:

For each vector x ∈ Rn indexed from 0, we write xˆ for the sub-vector consisting of entries n−2 ˆn 2 through n − 1; therefore x = (x0; x1; xˆ) ∈ R × R × R . We let E denote the n dimensional real vector space R × R × Rn−2 whose elements x are indexed with 0. The rotated quadratic cone [1] of dimension n is defined by

ˆn ˆn 2 E+ := {x = (x0; x1; xˆ) ∈ E : 2x0x1 ≥ ||xˆ|| , x0 ≥ 0, x1 ≥ 0}.

2 The constraint on x that satisfies the relation 2x0x1 ≥ ||xˆ|| is called a hyperbolic con- n ˆn straint. It is clear that the cones E+ and E+ have the same Euclidean Jordan algebraic structure. In fact, the later is obtained by rotating the former through an angle of thirty degrees in the x0x1-plane [1]. More specifically, by writing x ˆ 0 (x ˆ N 0)

ˆp ˆp1 ˆp2 ˆpN to mean that x ∈ E+ (x ∈ E+ × E+ × · · · E+ ), then one can easily see that the hyperbolic constraint (x0; x1; xˆ) ˆ 0 is equivalent to the second-order cone constraint

(2x0 + x1; 2x0 − x1; 2xˆ)  0. In fact,

2 2 2x0 + x1 ≥ ||(2x0 − x1; 2xˆ)||2 ⇐⇒ (2x0 + x1) ≥ ||(2x0 − x1; 2xˆ)||2

2 ⇐⇒ 4x0x1 ≥ −4x0x1 + 4||xˆ||2

2 ⇐⇒ 2x0x1 ≥ ||xˆ||2.

So, if we are given the same setting as in Problem (2.2.1, 2.2.2), the two-stage stochastic rotated quadratic cone program (SRQCP) in the primal standard form is the problem

37 min cTx + E [Q(x, ω)] s.t. Ax = b ˆ x N1 0, where x ∈ Rn1 is the first-stage decision variable and Q(x, ω) is the minimum value of the problem min d(ω)Ty s.t. W (ω)y = h(ω) − T (w)x ˆ y N2 0.

In §5.2 of Chapter 5, we describe two applications that illustrate the applicability of two-stage SRQCPs with recourse.

Problem 4. Stochastic semidefinite programs:

n1 n2 When J1 = S and J2 = S ; the spaces of real symmetric matrices of orders n1 and

1 n2, respectively, with Jordan multiplication X ◦ Y := 2 (XY + YX) and inner product

n1 n1 X •Y := trace(XY ), then KJ1 = S+ and KJ2 = S+ ; the cones of real symmetric positive semidefinite matrices of orders n1 and n2, respectively (see Example 19). We simply write

p the linear matrix inequality X  p 0 as X  0 (to mean that X ∈ S ). Hence the SSP S+ p + problem (2.1.1, 2.1.2) becomes the SSDP problem (see also [9, Subsection 2.1]):

min C • X + E [Q(X, ω)] s.t. AX = b (2.2.7) X  0, where Q(X, ω) is the minimum value of the problem

min D(ω) • Y s.t. W(ω)Y = h(ω) − T (ω)X (2.2.8) Y  0,

38 p p×p T p p2 T Euclidean Jordan alg. J S = {X ∈ R : X = X} vec(S ) = {vec(X) ∈ R : X = X} p p p p2 Symmetric cone KJ S+ = {X ∈ S : X  0} vec(S+) = {vec(X) ∈ R : X  0} 1 1 Jordan product x ◦ y X ◦ Y = 2 (XY + YX) vec(X) ◦ vec(Y ) = 2 vec(XY + YX) Inner product x • y X • Y = trace(XY ) vec(X) • vec(Y ) = vec(X)Tvec(Y ) The identity e Ip vec(Ip) 1 1 The matrix bXc 2 (I ⊗ X + X ⊗ I) 2 (I ⊗ X + X ⊗ I) The matrix dXe X ⊗ X X ⊗ X T T T T Spectral decomposition X = λ1 q1q1 +λ1 q2q2 + ··· vec(X) = λ1 vec(q1q1 ) +λ1 vec(q2q2 ) | {z } | {z } | {z } | {z } C1 C2 c1 c2 T T +λp qpqp + ··· + λp vec(qpqp ) | {z } | {z } Cp cp rank(KJ ) p p

Table 2.3: The Euclidean Jordan algebraic structure of the positive semidefinite cone in matrix and vectorized matrix forms. where X ∈ Sn1 is the first-stage decision variable, Y ∈ Sn2 is the second-stage variable, and Z E[Q(X, ω)] := Q(X, ω)P (dω). Ω

Here, clearly, C ∈ Sn1 , D ∈ Sn2 , and the linear operator A : Sn1 −→ Rm1 is defined by

AX := (A1 • X; A2 • X; ... ; Am1 • X)

n1 n2 m2 where Ai ∈ S for i = 1, 2, . . . , m1. The linear operators W : S −→ R and T : Sn2 −→ Rm1 are defined in a similar manner. Problem (2.2.7, 2.2.8) can also be written in a vectorized matrix form (see [25, Section 1]). See Table 2.3 for a summary of the Euclidean Jordan algebraic structure of the positive semidefinite cone in both matrix and vectorized matrix forms. Some applications leading to two-stage SSDPs with recourse can be found in [49].

Problem 5. Stochastic programming over complex Hermitian positive semidef- inite matrices: Recall that a square matrix with complex entries is called Hermitian if it is equal to its

39 own conjugate transpose, and that the eigenvalues of a (positive semidefinite) Hermitian matrix are always (nonnegative) real valued. Let Hp denote the space of complex Hermitian matrices of order p. Equipped with the Jordan multiplication X ◦ Y := (XY + YX)/2, the space Hp forms a Euclidean Jordan algebra. By following a similar argument to that in Example 19 (but using the spectral theorem for complex Hermitian matrices instead of using the spectral theorem for real symmetric matrices, see [43, Theorem 5.4.13]), we can show that the cone of squares of the space of complex Hermitian matrices of order p is the cone of complex Hermitian

p positive semidefinite matrices of orders p, denoted by H+. The Euclidean Jordan algebraic structure of the cone of complex Hermitian positive semidefinite matrices is analogous to that of the cone of real symmetric positive semidefinite matrices. Another subclass of stochastic symmetric programming problems is obtained when

n1 n2 n1 n1 J1 = H and J2 = H , and so we have that KJ1 = H+ and KJ2 = H+ ; the cones of complex Hermitian positive semidefinite matrices of orders n1 and n2, respectively. In fact, it is impossible to deal with the field of complex numbers directly as it is an unordered field. However, this can be overcome by defining a transformation that maps complex Hermitian matrices to symmetric matrices; see, for example, [35, 18].

Problem 6. Stochastic programming over quaternion Hermitian positive semidef- inite matrices: We first recall some notions of and quaternion matrices. The entries of the

field of real quaternions has the form x = x0 + x1i + x2j + x3k where x0, x1, x2, x3 ∈ R and i, j, and k are abstract symbols satisfying i2 = j2 = k2 = ijk = −1. The conjugate

∗ transpose of x is x = x0 − x1i − x2j − x3k. If A is a p × p quaternion matrix and A∗ is its conjugate transpose, then A is called Hermitian if A = A∗. Note that, if A is Hermitian, then w∗Aw is a real number for any p dimensional column vector w with quaternion components. The matrix A is called positive semidefinite if it is Hermitian

40 and w∗Aw ≥ 0 for any p dimensional column vector w with quaternion components. A (positive semidefinite) quaternion Hermitian matrix has (nonnegative) real eigenvalues. Let QHp denote the space of quaternion Hermitian matrices of order p. Equipped with the Jordan multiplication X ◦ Y := (XY + YX)/2, the space QHp forms a Euclidean Jordan algebra. It has been shown [46] that the spectral theorem holds also for quaternion matrices (i.e., any quaternion Hermitian matrix is conjugate to a real diagonal matrix). So, by following argument similar to that in Example 19, we can show that the cone of squares of the space of quaternion Hermitian matrices of order p is the cone of quaternion

p Hermitian positive semidefinite matrices of orders p, denoted by QH+. The Euclidean Jordan algebraic structure of the cone of complex Hermitian positive semidefinite matrices is analogues to that of the cone of real symmetric positive semidefinite matrices and that of the cone of complex Hermitian positive semidefinite matrices.

n1 n2 The last class of problems is obtained when J1 = QH and J2 = QH with X ◦Y :=

n1 n1 (XY + YX)/2. Hence KJ1 = QH+ and KJ2 = QH+ ; the cones of quaternion Hermitian

positive semidefinite matrices of orders n1 and n2, respectively. Similar to Problem (5), there is a difficulty in dealing with quaternion entries, but this difficulty can be overcome by defining a transformation that maps Hermitian matrices with quaternion entries to symmetric matrices; see, for example, [35, 18].

Of course, it is also possible to consider a “mixed problem” such as that one obtained

n1 n2 n1 n2 by selecting J1 = R and J2 = E , or by selecting J1 = E and J2 = S , etc. In many applications we find that the models lead to SSPs in dual standard form. The focus of this dissertation is on the SSP in dual standard form. In the next two chapters, we will focus on interior point algorithms for solving the SSP problem (2.1.3) and (2.1.4).

41 Chapter 3

A Class of Polynomial Logarithmic Barrier Decomposition Algorithms for Stochastic Symmetric Programming

In this chapter, we turn our attention to the study of logarithmic barrier decomposition- based interior point methods for the general SSP problem. Since our algorithm applies to all symmetric cones, this work extends Zhao’s work [48] which focuses particularly on SLP problem (Problem 1), and Mehrotra and Ozevin’s¨ work [25] which focuses particularly on the SSDP problem (Problem 4). More specifically, we use logarithmic barrier interior point methods to present a Bender’s decomposition-based algorithm for solving the SSP problem (2.1.3) and (2.1.4) and then prove its polynomial complexity. Our convergence analysis proceeds by showing that the log barrier associated with the recourse function of SSPs behaves as a strongly self-concordant barrier and forms a self-concordant family on the first stage solutions. Our procedure closely follows that of [25] (which essentially follows the procedure of [48]), but our setting is much more general. The results of this

42 chapter have been submitted for publication [3].

3.1 The log barrier problem for SSPs

We begin by presenting the extensive formulation of the SSP problem (2.1.3, 2.1.4) and the log barrier [30] problems associated with them.

3.1.1 Formulation and assumptions

We now examine (2.1.3, 2.1.4) when the event space Ω is finite. Let {(T (k),W (k), h(k), d(k)): k = 1, 2,...,K} be the set of the possible values of the random variables T (ω),

  (k) (k) (k) (k) W (ω), h(ω), d(ω) and let pk := P T (ω),W (ω), h(ω), d(ω) = T ,W , h , d be the associated probability for k = 1, 2,...,K. Then Problem (2.1.3, 2.1.4) becomes

K T X (k) max c x + pkQ (x) k=1 (3.1.1) s.t. Ax  b, KJ1 where, for k = 1, 2,...,K,Q(k)(x) is the maximum value of the problem

max d(k)Ty(k) (3.1.2) s.t. W (k)y(k)  h(k) − T (k)x, KJ2 where x ∈ Rm1 is the first-stage decision variable, and y(k) ∈ Rm2 is the second-stage

(k) (k) (k) variable for k = 1, 2,...,K. Now, for convenience we redefine d as d := pkd for

43 k = 1, 2,...,K, and rewrite Problem (3.1.1, 3.1.2) as

K X max cTx + Q(k)(x) k=1 (3.1.3) s.t. Ax + s = b s  0, KJ1 where, for k = 1, 2,...,K,Q(k)(x) is the maximum value of the problem

max d(k)Ty(k) s.t. W (k)y(k) + s(k) = h(k) − T (k)x (3.1.4) s(k)  0. KJ2

Let ν(k) be the second-stage dual multiplier. The dual of Problem of (3.1.4) is

min (h(k) − T (k)x) • ν(k) s.t. W (k)Tν(k) = d(k) (3.1.5) ν(k)  0. KJ2

The log barrier problem associated with Problem (3.1.3, 3.1.4) is

K X max η(µ, x) := cTx + ρ(k)(µ, x) + µ ln det s k=1 (3.1.6) s.t. Ax + s = b s 0, KJ1 where, for k = 1, 2, . . . , K, ρ(k)(µ, x) is the maximum value of the problem

max d(k)Ty(k) + µ ln det s(k) s.t. W (k)y(k) + s(k) = h(k) − T (k)x (3.1.7) s(k) 0. KJ2

44 (Here µ > 0 is a barrier parameter.) If for some k, Problem (3.1.7) is infeasible, then we

PK (k) define k=1 ρ (µ, x) := ∞. The log barrier problem associated with Problem (3.1.5) is the problem min (h(k) − T (k)x) • ν(k) − µ ln det ν(k) s.t. W (k)Tν(k) = d(k) (3.1.8) ν(k) 0, KJ2 which is the Lagrangian dual of (3.1.7). Because Problems (3.1.7) and (3.1.8) are, respec- tively, concave or convex, (y(k), s(k)) and ν(k) are optimal solutions to (3.1.7) and (3.1.8), respectively, if and only if they satisfy the following optimality conditions:

(k) (k) s ◦ ν = µ e2, W (k)y(k) + s(k) = h(k) − T (k)x, (3.1.9) W (k)Tν(k) = d(k), s(k) 0, ν(k) 0. KJ2 KJ2

The elements s(k) and ν(k) may not operator commute. So, the equality s(k) = µ ν(k)−1 may not hold (see Remark 1 in Chapter 1). In fact, we need to scale the optimality conditions (3.1.9) so that the scaled elements are simultaneously decomposed. Let p 0. From now on, with respect to p, we define KJ2

(k) −1 (k) (k) (k) −1 (k) −1 (k) (k) −1 (k) se := dp es , νe := dpeν , he := dp eh, Wf := dp eW , and Te := dp eT .

Recall that, dpedp−1e = dpedpe−1 = I. We have the following lemma and proposition.

Lemma 3.1.1 (Lemma 28, [36]). Let p be an invertible element in J2. Then s ◦ ν = µe2 if and only if se ◦ νe = µe2.

Proposition 3.1.1. (y, s, ν) satisfies the optimality conditions ( 3.1.1) if and only if

45 (y, se, νe) satisfies the relaxed optimality conditions:

(k) (k) se ◦ νe = µe2, Wf(k)y(k) + s(k) = he (k) − Te(k)x, e (3.1.10) (k)T (k) (k) Wf νe = d , (k) (k) s K 0, ν K 0. e J2 e J2

Proof. The proof follows from Lemma 3.1.1, Proposition 1.3.2, and the fact that dpe(KJ2 ) =

KJ2 , and likewise, as an operator, dpe(int(KJ2 )) = int(KJ2 ), because KJ2 is symmetric. 2

To our knowledge, this is the first time this effective way of scaling is being used for stochastic programing, but it was originally proposed by Monteiro [28] and Zhang [47] for DSDP, and after that generalized by Schmieta and Alizadeh [36] for DSP. With this change of variable Problem (3.1.6, 3.1.7) becomes

K T X (k) max ηe(µ, x) := c x + ρe (µ, x) + µ ln det s k=1 (3.1.11) s.t. Ax + s = b s 0, KJ1

(k) where, for k = 1, 2,...,K, ρe (µ, x) is the maximum value of the problem

(k)T (k) (k) max d y + µ ln det se (k) (k) (k) (k) (k) (3.1.12) s.t. Wf y + se = he − Te x (k) s K 0, e J2

46 and Problem (3.1.8) becomes

(k) (k) (k) (k) min (he − Te x) • νe − µ ln det νe (k)T (k) (k) (3.1.13) s.t. Wf νe = d (k) ν K 0, e J2

Note that the Problem (3.1.7) and Problem (3.1.12) have the same maximizer, but their optimal objective values are equal up to a constant. More precisely, the difference between their optimal objective values would be µ ln det p2 (see item 1 of Lemma (1.3.2)). Similarly, Problems (3.1.8) and (3.1.13) have the same minimizer but their optimal ob- jective values differ by the same value (µ ln det p2). The SSP (3.1.11, 3.1.12) can be equivalently written as a DSP:

= ηe(µ,x) z }| { K T X  (k)T (k) (k) max c x + µ ln det s + d y + µ ln det se k=1 | {z } = ρ(k)(µ,x) e (3.1.14) s.t. Ax + s = b

(k) (k) (k) (k) (k) Wf y + se = he − Te x, k = 1, 2,...,K (k) s K 0, s K 0, k = 1, 2,...,K. J1 e J2

2 In Subsection 3.1.2, we need to compute ∇xηe(µ, x) and ∇ ηe(µ, x) so that we can determine the Newton direction defined by

2 −1 ∆x := −{∇ ηe(µ, x)} ∇xηe(µ, x) for our algorithms. We shall see that each choice of p leads to a different search direction (see Algorithm 1). As we mentioned earlier, we are interested in the class of p for which the scaled elements are simultaneously decomposed. In view of Theorem 1.3.5, it is enough

(k) (k) to choose p so that se and νe operator commute. That is, we restrict our attention to

47 the following set of scalings:

(k) (k) (k) (k) C(s , ν ) := {p K 0 | s and ν operator commute}. J2 e e

We introduce the following definition [1].

Definition 3.1.1. The set of directions ∆x arising from those p ∈ C(s(k), ν(k)) is called the commutative class of directions, and a direction in this class is called a commutative direction.

It is clear that p = e may not be in C(s(k), ν(k)). The following choices of p shows that the set C(s(k), ν(k)) is not empty.

(k)1/2 (k) We may choose p = s and get se = e2. In fact, by using Lemma 1.3.2, it can be seen that (k) −1 (k) se = dp e s l m  2 = s(k)−1/2 s(k)1/2

l (k)−1/2m l (k)1/2m = s s e2  −1  l (k)1/2m l (k)1/2m = s s e2

= e2.

(k)−1/2 (k) We may choose p = ν and get νe = e2. To see this, note that by using Lemma 1.3.2 we have (k) (k) νe = dpe ν l m  2 = ν(k)−1/2 ν(k)1/2

l (k)−1/2m l (k)1/2m = ν ν e2  −1  l (k)1/2m l (k)1/2m = ν ν e2

= e2.

The above two choices of directions are well-known in commutative classes. These two choices form a class of Newton directions derived by Helmberg et al [20], Monteiro [28],

48 and Kojima et al [22], and referred to as the HRVW/KSH/M directions. It is interesting to mention that the popular search direction due to Nesterov-Todd (NT) is also in a

(k) (k) commutative class, because in this case one chooses p in such a way that νe = se . More precisely, in the NT direction we choose

−1/2 −1/2 l 1/2m l 1/2m −1/2 l −1/2m l 1/2m 1/2 p = ν(k) ν(k) s(k) = s(k) s(k) ν(k) ,

and, using Lemma 1.3.2, we get

dpe2ν(k) = dp2eν(k) l m l m −1/2−1 = ν(k)1/2 ν(k)1/2 s(k) ν(k) l m l m 1/2 = ν(k)−1/2 ν(k)1/2 s(k) ν(k) l m l m 1/2 l m = ν(k)−1/2 ν(k)1/2 s(k) ν(k)−1/2 ν(k)

  1/2 l (k)−1/2m l (k)1/2m (k) = ν ν s e2 l m l m = ν(k)−1/2 ν(k)1/2 s(k)

= s(k).

(k) (k) −1 (k) (k) Thus, νe = dpeν = dp es = se . We proceed by making some assumptions. First we define

F := x : Ax + s = b, s 0 ; 1 KJ1 (k)  (k) (k) (k) (k) (k) (k) (k) F (x) := y : W y + s = h − T x, s K 0 for k = 1, 2,...,K; f e e e e J2 (k)  (k) F2 := x: F (x) 6= ∅ for k = 1, 2,...,K; TK (k) F2 := k=1 F2 ; T F0 := F1 F2;

49  (1) (1) (1) (K) (K) (K) F := (x, s, γ) × (y , s , ν ,..., y , s , ν ): Ax + s = b, s K 0, e e e e J1 (k) (k) (k) (k) (k) (k) (k)T (k) (k) W y + s = h − T x, s K 0, W ν = d , f e e e e J2 f e (k) T PK (k)T (k) ν K 0, k = 1, 2,...,K; A γ + T ν = c . e J2 k=1 e e Here γ is the first-stage dual multiplier. Now we make

Assumption 3.1.1. The matrices A and Wf(k) for all k have full column rank.

Assumption 3.1.2. The set F is nonempty.

Assumption 3.1.1 is for convenience. Assumption 3.1.2 guarantees strong duality for first- and second-stage SSPs. In other words, it requires that Problem (3.1.14) and its dual have strictly feasible solutions. This implies that problems (3.1.11-3.1.14) have a unique

PK (k) solutions. Note that for a given µ > 0, k=1 ρe (µ, x) < ∞ if and only if x ∈ F2. Hence, the feasible region for (3.1.11) is described implicitly by F0. Throughout the paper we denote the optimal solution of the first-stage problem (3.1.11) by x(µ), and the solutions

(k) (k) (k) of the optimality conditions (3.1.10) by (y (µ, x), se (µ, x), νe (µ, x)). The following proposition establishes the relationship between the optimal solutions of Problems (3.1.11, 3.1.12) and those of Problem (3.1.14).

(1) (1) (K) Proposition 3.1.2. Let µ > 0 be fixed. Then (x(µ), s(µ); y (µ), se (µ); ··· ; y (µ), (K) se (µ)) is the optimal solution of (3.1.14) if and only if (x(µ), s(µ)) is the optimal solution (1) (1) (K) (K) of (3.1.11) and (y (µ), se (µ); ··· ; y (µ), se (µ)) are the optimal solutions for (3.1.12) for given µ and x = x(µ).

2 3.1.2 Computation of ∇xηe(µ, x) and ∇xxηe(µ, x)

2 In order to compute ∇xηe(µ, x) and ∇xxηe(µ, x), we need to determine the derivative of (k) (k) (k) (k) (k) (k) (k) ρe (µ, x) with respect to x. Let (y , νe , se ) := (y (µ, x), νe (µ, x), se (µ, x)). We

50 first note that from (3.1.10) we have that

(k) (k) (k) (k) (k) (k) (k) (k)T (k)T (k) (k) (k) (k)T (k) (he −Te x)•νe = (Wf y +se )•νe = y (Wf νe )+se •νe = y d +r2 µ,

where in the second equality we used the observation that

m2 m2 (k) (k) (k) X (k) (k) (k) X (k) (k) (k) (k)T (k)T (k) Wf y • νe = (yi wei ) • νe = yi (wei • νe ) = y (Wf νe ) i=1 i=1

(k) th (k) (here wei ∈ J2 is the i column of Wf ), and in the last equality we used that trace(e2) = rank(J2) = r2. This implies that

(k) (k) (k) (k) (k) (k) (k) (he − Te x) • νe − µ ln det νe = ρe (µ, x) − µ ln det se + r2 µ − µ ln det νe (k) (k) (k) = ρe (µ, x) + r2 µ − µ ln det(se ◦ νe ) (k) = ρe (µ, x) + r2 µ (1 − lnµ).

Thus,

(k) (k) (k) (k) (k) (3.1.15) ρe (µ, x) = (he − Te x) • νe − µ ln det νe − r2 µ (1 − lnµ).

Differentiating (3.1.15) and using the optimality conditions (3.1.10), we obtain

(k) (k) (k) (k) (k) ∇xρe (µ, x) = ∇x((he − Te x) • νe ) − µ ∇xln det νe (k) (k) T (k) (k) T (k) (k) (k) T (k)−1 = (∇x(he − Te x)) νe + (∇xνe ) (he − Te x) − µ (∇xνe ) νe (k)T (k) (k) T (k) (k) (k) (k) T (k)−1 = −Te νe + (∇xνe ) (Wf y + se ) − µ (∇xνe ) νe (k)T (k) (k) T (k) (k) (k) T (k) (k)−1 = −Te νe + (∇xνe ) (Wf y ) + (∇xνe ) (se − µ νe ) (k)T (k) (k) T (k) (k) = −Te νe + (∇xνe ) (Wf y )

(k) (k) (k) (k) th (k) From (3.1.10) we have that Wfi • νe = di , where di is the i entry of d . This implies that

51 m2 (k) T (k) (k) (k) T X (k) (k) (∇xνe ) (Wf y ) = (∇xνe ) yi wei i=1 m2 X (k) (k) T (k) = yi (∇xνe ) wei i=1 m2 X (k) (k) (k) = yi ∇x(νe • wei ) i=1 = 0.

Thus,

(k) (k)T (k) 2 (k) (k)T (k) ∇xρe (µ, x) = −Te νe and ∇xxρe (µ, x) = −Te ∇xνe .

(k) Therefore, we also need to determine the derivative of νe with respect to x. Differ- entiating (3.1.10) with respect to x, we get the system

(k) l (k)−1 m (k) ∇xνe = −µ se ∇xse , (k) (k) (k) (k) (3.1.16) Wf ∇xy + ∇xse = −Te , (k)T (k) Wf ∇xνe = 0.

Solving the system (3.1.16), we obtain

(k) l (k)1/2 m (k) l (k)−1/2 m (k) ∇xse = − se P se Te , l m l m (k) (k)−1/2 (k) (k)−1/2 (k) (3.1.17) ∇xνe = µ se P se Te , (k) (k)−1 (k)T l (k)−1 m (k) ∇xy = −R Wf se Te , where

(k) (k) (k)T l (k)−1 m (k) R := R (µ, x) = Wf se Wf , (3.1.18) (k) (k) l (k)−1/2 m (k) (k)−1 (k)T l (k)−1/2 m P := P (µ, x) = I − se Wf R Wf se .

Observing that, by differentiating Ax + s = b with respect to x, we get ∇xs = −A.

52 We then have

K X (k) T −1 ∇xηe(µ, x) = c − ∇xρe (µ, x) + µ (∇xs) s , k=1 K (3.1.19) X (k)T (k) T −1 = c − Te νe − µ A s , k=1 and

K 2 X (k)T (k) T  −1 ∇xxηe(µ, x) = − Te ∇xνe + µ A s ∇xs k=1 K X (k)T l (k)−1/2 m (k) l (k)−1/2 m (k) T  −1 (3.1.20) = −µ Te se P se Te − µ A s A. k=1

3.2 Self-concordance properties of the log-barrier re-

course

The notion of so called self-concordant functions introduced by Nesterov and Nemirovskii [30] allows us to develop polynomial time path following interior point methods for solving SSPs. In this section, we prove that the recourse function with log barrier is a strongly self-concordant function and leads to a strongly self-concordant family with appropriate parameters.

3.2.1 Self-concordance of the recourse function

This subsection is devoted to show that ηe(µ, ·) is a µ strongly self-concordant barrier on

F0. First, we have the following definition.

Definition 3.2.1 (Nesterov and Nemirovskii [30, Definition 2.1.1]). Let E be a finite- dimensional real vector space space, G be an open nonempty convex subset of E, and let

53 f be a C3, convex mapping from G to R. Then f is called α-self-concordant on G with the parameter α > 0 if for every x ∈ G and h ∈ E, the following inequality holds:

3 −1/2 2 3/2 |∇xxxf(x)[h, h, h]| ≤ 2α (∇xxf(x)[h, h]) . (3.2.1)

An α-self-concordant function f on G is called strongly α-self-concordant if f tends to infinity for any sequence approaching a boundary point of G.

We note that in the above definition the set G is assumed to be open. However, relative openness would be sufficient to apply the definition. See also [30, Item A, Page 57].

The proof of strongly self-concordance of the function ηe(µ, ·) relies on two lemmas that we state and prove below. We point out that the proof of the first lemma is shown

n1 n1 in [30, Proposition 5.4.5] for s ∈ S++ := int(S+ ), i.e., s lies in the interior of the cone of real symmetric positive semidefinite matrices of orders n1. We now generalize this proof for the case where s lies in the interior of an arbitrary symmetric cone of dimension n1, where n1, as mentioned before, is any positive integer.

Lemma 3.2.1. For any fixed µ > 0, the function f(s) := −µ ln det s is a µ strongly

self-concordant barrier on KJ1 .

∞ Proof. Let {si}i=1 be any sequence in KJ1 . It is clear that the function f(si) tends to

i ∞ when s approaches a point from boundary of KJ1 . It remains to show that (3.2.1) is satisfied for f(s) on K . Let s 0 and h ∈ J , then there exists an element u ∈ J J1 KJ1 1 1 such that s = u2, and therefore, by Lemmas 1.3.1 and 1.3.2, we have that

0 ∇sf(s)[h] = −µ{ln det (s + th)} |t=0

2 0 = −µ{ln det (u + th)} |t=0

−1 0 = −µ{ln det (due(e1 + tdue h))} |t=0

2 −1 0 = −µ{ln det (u ) + ln det (e1 + tdue h))} |t=0

54 trace(due−1h) = −µ det e1 = −µ(s−1 • h),

2 ∇ssf(s)[h, h] = ∇s(∇sf(s)[h]) • h

−1 = −µ (∇s(s • h)) • h

−1 = −µ (Ds(s ◦ h)) • h = µ (s−2 ◦ h) • h = µ{(s−1 ◦ h) • (s−1 ◦ h)},

3 2 ∇sssf(s)[h, h, h] = ∇s(∇ssf(s)[h, h]) • h

−1 −1 = µ (∇s((s ◦ h) • (s ◦ h))) • h

−1 −1 = µ (Ds((s ◦ h) ◦ (s ◦ h))) • h = −2µ ((s−2 ◦ h) ◦ (s−1 ◦ h)) • h = −2µ{(s−1 ◦ h) • ((s−1 ◦ h) ◦ (s−1 ◦ h))}.

ˆ −1 Let h = s ◦ h ∈ J1, and λ1, λ2, . . . , λp be its eigenvalues. The result is established by observing that

p ( p )3/2 ˆ3 X 3 X 2 ˆ2 3/2 2µ |trace(h )| = 2µ |λi| ≤ 2µ λi = 2µ {trace(h )} . 2 | {z } i=1 i=1 | {z } = |∇3 f(s)[h,h,h]| −1/2 2 3/2 sss = 2µ {∇xxf(s)[h,h]}

(k) Lemma 3.2.2. For any fixed µ > 0, ρe (µ, x) is a µ strongly self-concordant barrier on (k) F2 , k = 1, 2, ··· ,K.

∞ (k) (k) Proof. Let {xi}i=1 be any sequence in F2 . It is clear that the function ρe (µ, xi) tends i (k) to ∞ when x approaches a point from boundary of F2 . It remains to show that (3.2.1)

(k) (k) (k) m1 is satisfied for ρe (µ, x) on F2 . For any µ > 0, x ∈ F2 , and d ∈ R , we define the univariate function

(k) 2 (k) Ψ (t) := ∇xxρe (µ, x + td)[d, d].

55 (k) 2 (k) (k) 0 3 (k) Note that Ψ (0) = ∇xxρe (µ, x)[d, d] and Ψ (0) = ∇xxxρe (µ, x)[d, d, d]. So, to prove the lemma, it is enough to show that

2 |Ψ(k)(0)0| ≤ √ |Ψ(k)(0)|3/2. µ

(k) (k) (k) (k) (k) (k) (k) (k) Let (νe (t), se (t),P (t),R (t)) := (νe (µ, x+td), se (µ, x+td),P (µ, x+td),R (µ, x+ td)). We also define

(k) √ (k) l (k)−1/2 m (k) u (t) := µ P (t) se (t) T (t)d.

(k) (k) (k) (k) By the notations introduced in §4, we have (νe , se ) = (νe (0), se (0)). We also let (P (k),R(k), u(k)) := (P (k)(0),R(k)(0), u(k)(0)). Notice that, by the definition of P (k), we have P (k)2 = P (k). Using (3.1.17) and (3.1.18), we get

(k) 2 (k) Ψ (0) = ∇xxρe (µ, x)[d, d] (k)T (k) = −(Te ∇xνe )[d, d]  (k)T l (k)−1/2 m (k) l (k)−1/2 m (k) = −µ Te se (t) P se (t) Te [d, d] T  (k)T l (k)−1/2 m (k)2 l (k)−1/2 m (k) = −µ d Te se (t) P se (t) Te d = −u(k) • u(k) = −ku(k)k2.

Hence Ψ(k)(0)0 = −2u(k) • u(k)0. So, in order to bound |Ψ(k)(0)0|, we need to compute the derivative of u(k) with respect to t. Using (3.1.18), we have

0 (k)0 √ n (k) l (k)−1/2 m o (k) u = µ P se (t) T d 2 0 √ n l (k)−1/2 m l (k)−1/2 m (k) (k)−1 (k)Tl (k)−1/2 m o (k) = µ se (t) − se (t) Wf R Wf se (t) T d

56 0 0 2 √ nl (k)−1/2 m l (k)−1/2 m (k) (k)−1 (k)Tl (k)−1/2 m = µ se (t) − se (t) Wf R Wf se (t) 2 l (k)−1/2 m (k) (k)−1 0 (k)Tl (k)−1/2 m − se (t) Wf (R ) Wf se (t) 2 0 l (k)−1/2 m (k) (k)−1 (k)Tl (k)−1/2 m  o (k) − se (t) Wf R Wf se (t) T d 0 0 2 √ nl (k)−1/2 m l (k)−1/2 m (k) (k)−1 (k)Tl (k)−1/2 m = µ se (t) − se (t) Wf R Wf se (t) 2 l (k)−1/2 m (k) (k)−1 (k) 0 (k)−1 (k)Tl (k)−1/2 m + se (t) Wf R (R ) R Wf se (t) 0 l (k)−1/2 m (k) (k)−1 (k)T l (k)−1/2 m l (k)−1/2 m − se (t) Wf R Wf se (t) se (t) 0 l (k)−1/2 m l (k)−1/2 m o (k) + se (t) se (t) T d 0 0 2 √ nl (k)−1/2 m l (k)−1/2 m (k) (k)−1 (k)Tl (k)−1/2 m = µ se (t) − se (t) Wf R Wf se (t) 0 l (k)−1/2 m (k) (k)−1 (k)T l (k)−1/2 m l (k)−1/2 m + se (t) Wf R Wf se (t) se (t) 0 2 l (k)−1/2 m l (k)−1/2 m  (k) (k)−1 (k)Tl (k)−1/2 m + se (t) se (t) Wf R Wf se (t) 0 l (k)−1/2 m (k) (k)−1 (k)T l (k)−1/2 m l (k)−1/2 m − se (t) Wf R Wf se (t) se (t) 0 l (k)−1/2 m l (k)−1/2 m o (k) + se (t) se (t) T d 0 2 √ nl (k)−1/2 m  (k) (k)−1 (k)Tl (k)−1/2 m  = µ se (t) I − Wf R Wf se (t) 0 l (k)−1/2 m (k) (k)−1 (k)T l (k)−1/2 m l (k)−1/2 m − se (t) Wf R Wf se (t) se (t) 0 2 l (k)−1/2 m l (k)−1/2 m  (k) (k)−1 (k)Tl (k)−1/2 m o (k) + se (t) se (t) − Wf R Wf se (t) + I T d 0 0 √ nl (k)−1/2 m l (k)−1/2 m (k) (k)−1 (k)T l (k)−1/2 m l (k)−1/2 m = µ se (t) − se (t) Wf R Wf se (t) se (t) 0 2 l (k)−1/2 m l (k)−1/2 m o (k) (k)−1 (k)Tl (k)−1/2 m  (k) + se (t) se (t) I − Wf R Wf se (t) T d 0 0 √ nl (k)−1/2 m l (k)−1/2 m (k) (k)−1 (k)T l (k)−1/2 m l (k)−1/2 m = µ se (t) − se (t) Wf R Wf se (t) se (t) 0 −1 l (k)−1/2 m l (k)−1/2 m ol (k)−1/2 m (k) l (k)−1/2 m  (k) + se (t) se (t) se (t) P se (t) T d 0 0 nl (k)−1/2 m l (k)−1/2 m (k) (k)−1 (k)T l (k)−1/2 m l (k)−1/2 m = se (t) − se (t) Wf R Wf se (t) se (t) 0 −1 l (k)−1/2 m l (k)−1/2 m ol (k)−1/2 m (k) + se (t) se (t) se (t) u .

Notice that, for any ξ ∈ Rm2 , we have

(k) l (k)−1/2 m (k)  (k) l (k)−1/2 m (k)  l (k)−1/2 m (k) u • se (t) Wf ξ = P se (t) T d • se (t) Wf ξ  l (k)−1/2 m (k)  (k) l (k)−1/2 m (k) = se (t) T d • P se (t) Wf ξ

57  l (k)−1/2 m (k)   l (k)−1/2 m (k) = se (t) T d • se (t) Wf ξ 2 l (k)−1/2 m (k) (k)−1 (k)Tl (k)−1/2 m (k)  − se (t) Wf R Wf se (t) Wf ξ | {z } = R(k) = 0.

This implies that

l m0l m−1  (k) 0 (k) (k)0 (k) (k)−1/2 (k)−1/2 (k) (3.2.2) Ψ (0) = 2u • u = 2u • se (t) se (t) u .

By (3.2.2), (3.1.17) and (3.1.18) and using norm inequalities, we get

0 0 l −1/2 m l 1/2 m |Ψ(k)(0) | = 2µ−1/2 u(k) • s(k) (t) s(k) (t) u(k) e e 0 0 l −1/2 m l 1/2 m l 1/2 ml −1/2 m  = µ−1/2 u(k) • s(k) (t) s(k) (t) + s(k) (t) s(k) (t) u(k) e e e e l ml ml m0 −1/2 (k) (k)1/2 (k)−1/2 (k)−1/2 = µ u • se (t) se (t) se (t) l m0l ml m (k)−1/2 (k)−1/2 (k)1/2 (k) + se (t) se (t) se (t) u l ml m20l m −1/2 (k) (k)1/2 (k)−1/2 (k)1/2 (k) = µ u • se (t) se (t) se (t) u l ml m0l m −1/2 (k) (k)1/2 (k)−1 (k)1/2 (k) = µ u • se (t) se (t) se (t) u l m l m −1/2 (k) (k)−1/2  (k) 0 (k)−1/2 (k) = µ u • νe (t) νe (t) νe (t) u l m l m −1/2 (k) (k)−1/2  (k) (k)0  (k)−1/2 (k) = 2µ u • νe (t) νe (t), νe (t) νe (t) u l m l m −1/2 (k) (k)−1/2  (k) (k) 0  (k)−1/2 (k) = 2µ u • νe (t) νe (t), {ν (µ, x + td)} |t=0 νe (t) u l m l m −1/2 (k) (k)−1/2  (k) (k)  (k)−1/2 (k) = 2µ u • νe (t) νe (t), ∇xνe [d] νe (t) u l m −1/2 (k) (k)−1/2 (k) (k)−1/2 (k) = 2µ u • e2, νe (t) ◦ ∇xνe [d] ◦ νe (t) u j k −1/2 (k) (k)−1/2 (k) (k)−1/2 (k) = 2µ u • νe (t) ◦ ∇xνe [d] ◦ νe (t) u j k −1/2 (k) (k)−1/2 (k) (k)−1/2 (k) ≤ 2µ ku k νe (t) ◦ ∇xνe [d] ◦ νe (t) u

−1/2 (k) 2 (k)−1/2 (k) (k)−1/2 ≤ 2µ ku k νe (t) ◦ ∇xνe [d] ◦ νe (t) l m −1/2 (k) 2 (k)−1/2 (k) = 2µ ku k νe (t) ∇xνe d l m −1 (k) 2 (k)1/2 (k) = 2µ ku k se (t) ∇xνe d

58 = 2µ−1/2ku(k)k2ku(k)k = 2µ−1/2kΨ(k)(0)k3/2. The lemma is established. 2

Theorem 3.2.1. For any fixed µ > 0, ηe(µ, x) is a µ strongly self-concordant barrier on

F0. Proof. It is trivial to see that the linear function cTx is a µ strongly self-concordant

barrier on F1 (indeed, both sides of (3.2.1) are identically zero). By Lemma 3.2.1, one

can see that the function µ ln det s is also a µ strongly self-concordant barrier on F1.

PK (k) By Lemma 3.2.2 and [30, Proposition 2.1.1(ii)], we can conclude that k=1 ρe (µ, x) is a µ strongly self-concordant barrier on F2. The theorem is then established by [30, Proposition 2.1.1(ii)]. 2

3.2.2 Parameters of the self-concordant family

We have shown that the recourse function ηe(µ, ·) is a strongly self-concordant function. Having such a property, this function enjoys many nice features. In this subsection, we show that the family of functions {ηe(µ, ·): µ > 0} is a strongly self-concordant family with appropriate parameters. We first introduce the definition of self-concordant families.

Definition 3.2.2 (Nesterov and Nemirovskii [30, Definition 3.1.1]). Let R++ be the set of

n all positive real numbers. Let G be an open nonempty convex subset of R . Let µ ∈ R++

and let fµ : R++ × G → R be a family of functions indexed by µ. Let α1(µ), α2(µ), α3(µ),

α4(µ) and α5(µ): R++ → R++ be continuously differentiable functions on µ. Then the

family of functions {fµ}µ∈R++ is called strongly self-concordant with the parameters α1,

α2, α3, α4, α5, if the following conditions hold:

(i) fµ is continuous on R++ × G, and for fixed µ ∈ R++, fµ is convex on G. fµ has

three partial derivatives on G, which are continuous on R++ × G and continuously

differentiable with respect to µ on R++.

59 (ii) For any µ ∈ R++, the function fµ is strongly α1(µ)-self-concordant.

n (iii) For any (µ, x) ∈ R++ × G and any h ∈ R ,

1 1 0 0 2 2  2 |{∇xfµ(µ, x)[h]} − {ln α3(µ)} ∇xfµ(µ, x)[h]| ≤ α4(µ)α1(µ) ∇xxfµ(µ, x)[h, h] ,

2 0 0 2 2 |{∇xxfµ(µ, x)[h, h]} − {ln α2(µ)} ∇xxfµ(µ, x)[h, h]| ≤ 2α5(µ)∇xxfµ(µ, x)[h, h].

0 In this section, we need to compute {∇xηe(µ, x)} , and in order to do so we need to (k) (k)0 (k)0 (k)0 determine the partial derivative of νe (µ, x) with respect to µ. Let (y , νe , se ) (k) (k) (k) denote the partial derivatives of (y (µ, x), νe (µ, x), se (µ, x)) with respect to µ. Dif- ferentiating (3.1.10) with respect to µ, we get the system

(k) (k)0 (k) (k)0 se ◦ νe + νe ◦ se = e2, (k) (k)0 (k)0 (3.2.3) Wf y + se = 0, (k)T (k)0 Wf νe = 0. Solving the system (3.2.3), we obtain

(k)0 (k) (k)−1 (k)T (k)−1 se = Wf R Wf se , l m (k)0 (k)−1/2 (k) (3.2.4) νe = se (t) P e2, (k)0 (k)−1 (k)T (k)−1 y = −R Wf se .

The proof of strongly self-concordance of the family {ηe(µ, ·): µ > 0} depends on the following two lemmas.

m1 Lemma 3.2.3. For any µ > 0, x ∈ F0 and h ∈ R , the following inequality holds:

−(r + Kr ) 1/2 |{∇ η(µ, x)T[h]}0| ≤ 1 2 ∇2 η(µ, x)[h, h] . xe µ xxe

60 Proof. By differentiating (3.1.19) with respect to µ and applying (3.2.4), we obtain

K 0 X (k)T l (k)−1/2 m (k) T −1 1 {∇xη(µ, x)} = − Te s (t) P e2 − A s = −√ Bε, e e µ k=1 where B ∈ Rm1×(Kn2+n1) is defined by

h√ (1)T l (1)−1/2 m (1) √ (k)T l (K)−1/2 m (K) √ T  −1/2 i B := µ T se (t) P ,..., µ Te se (t) P , µ A s and

ε := (e2; ... ; e2; e1) ∈ J2 × · · · × J2 ×J1. | {z } | {z } K times K times Notice that, in view of (3.1.20), we have

K T X (k)T l (k)−1/2 m (k) l (k)−1/2 m (k) T  −1 2 BB = µ Te se (t) P se (t) Te + µ A s A = −∇xxηe(µ, x). k=1

This gives

−{∇ η(µ, x)T}0{∇2 η(µ, x)}−1{∇ η(µ, x)}0 = 1 ε • BT[BBT]−1Bε xe xxe xe µ 1 ≤ µ ε • ε 1 = µ (e1 • e1 + Ke2 • e2).

Recall that e1 • e1 = rank(J1) = r1 and e2 • e2 = rank(J2) = r2. It follows that

(r + Kr ) −{∇ η(µ, x)T}0{∇2 η(µ, x)}−1{∇ η(µ, x)}0 ≤ 1 2 . (3.2.5) xe xxe xe µ

We then have

T 0 T 0 2 −1 01/2 2 1/2 |{∇xηe(µ, x) [h]} | ≤ −{∇xηe(µ, x) } {∇xxηe(µ, x)} {∇xηe(µ, x)} (−∇xxηe(µ, x)[h, h]) −(r + Kr ) 1/2 ≤ 1 2 ∇2 η(µ, x)[h, h] , as desired. 2 µ xxe

61 m1 Lemma 3.2.4. For any µ > 0, x ∈ F0 and h ∈ R , the following inequality holds:

√ 2 r |{∇2 η(µ, x)[h, h]}0| ≤ − 2 ∇2 η(µ, x)[h, h]. xxe µ xxe

(k) (k) (k) (k) (k) (k) (k) (k) Proof. Let (νe , se ,P ,R ) := (νe (µ, x), se (µ, x),P (µ, x),R (µ, x)). We fix h ∈ Rm1 and define (k) √ (k) l (k)−1/2 m (k) u := µ P se T h.

Now by following some similar steps as in the proof of Lemma 3.2.2, and using (3.2.2), (3.1.18), (3.2.3), (3.1.10), and (3.2.4), we get

(u(k) • u(k))0 = 2 u(k) • u(k)0 0 −1 (k) l (k)−1/2 m l (k)−1/2 m (k) = 2 u • se (t) se (t) u 0 (k) l (k)1/2 ml (k)−1 m l (k)1/2 m (k) = u • se (t) se (t) se (t) u (k) l (k)−1/2 m (k) 0l (k)−1/2 m (k) = u • νe (t) νe (t) νe (t) u (k) l (k)−1/2 m (k) (k)0 l (k)−1/2 m (k) = 2 u • νe (t) νe (t), νe (t) νe (t) u (k) l (k)−1/2 (k)0 (k)−1/2 m (k) = 2 u • e2, νe (t) ◦ νe (t) ◦ νe (t) u (k) j (k)−1/2 (k)0 (k)−1/2 k (k) = 2 u • νe (t) ◦ νe (t) ◦ νe (t) u (k) j (k)−1 (k)0 k (k) = 2 u • νe (t) ◦ νe (t) u −1 (k)  (k) (k)0  (k) = 2µ u • se (t) ◦ νe (t) u −1 (k)  (k) (k)0  (k) = 2µ u • e2 − νe (t) ◦ se (t) u −1 (k) j (k)−1 (k)0 k (k) = 2µ u • e2 − µ se (t) ◦ se (t) u l m −1 (k) 2 (k)−1/2 (k)0 ≤ 2µ ku k e2 − µ se (t) se (t) l m −1 (k) 2 (k)−1/2 (k) (k)−1 (k)T (k)−1 = 2µ ku k e2 − µ se (t) Wf R Wf se −1 (k) 2 (k) = 2µ ku k ke2 − µ (e2 − P e2)k

−1 (k) 2 ≤ 2µ ku k ke2k,

where the last inequality is obtained by observing that e − P (k)e  0, which can be 2 2 KJ2

62 immediately seen by noting that P (k)  I.

2 Recall that ke2k = rank(J2) = r2. This gives us

√ 2 r |(u(k) • u(k))0| ≤ 2 u(k) • u(k). µ

From (3.1.20), we have that

K 2 X (k) (k) T  −1 ∇xxηe(µ, x)[h, h] = − u • u − µ (A s A)[h, h]. k=1

Therefore, for any h ∈ Rn, we have

K 2 0 X (k) (k) 0 T  −1 |{∇xxηe(µ, x)[h, h]} | ≤ |(u • u ) | + (A s A)[h, h] k=1 √ K 2 r2 X ≤ u(k) • u(k) + (AT s−1 A)[h, h] µ k=1 √ ≤ − 2 r2 ∇2 η(µ, x)[h, h]. µ xxe

This completes the proof. 2

Theorem 3.2.2. The family {ηe(µ, ·): µ > 0} is a strongly self-concordant family with the following parameters

√ √ r + Kr r α (µ) = µ, α (µ) = α (µ) = 1, α (µ) = 1 2 , α (µ) = 2 . 1 2 3 4 µ 5 µ

Proof. It is clear that condition (i) of Definition 3.2.2 holds. Theorem 3.2.1 shows that condition (ii) is satisfied and Lemmas 3.2.4 and 3.2.3 show that condition (iii) is satisfied. 2

63 3.3 A class of logarithmic barrier algorithms for solv-

ing SSPs

In §3.2 we have established that the parametric functions ηe(µ, ·) constitute a strongly self-concordant family. Therefore, it is straightforward to develop primal path following interior point algorithms for solving SSP (3.1.3, 3.1.4). In this section we introduce a class of log barrier algorithms for solving this problem. This class is stated formally in Algorithm 1.

Algorithm 1 The Decomposition Algorithm for Solving SSP 0 0 Require:  > 0, γ ∈ (0, 1), θ > 0, β > 0, x ∈ F0 and µ > 0. x := x0, µ := µ0 while µ ≥  do for k = 1, 2,...,K do solve (3.1.9) to obtain (y(k), s(k), ν(k)) (k) (k) (k) (k) choose a scaling element p ∈ C(s , ν ) and compute (se , νe ) end for 2 −1 compute ∆x := −{∇xxηe(µ, x)} ∇xηe(µ, x) using (3.1.19) and (3.1.20) r 1 compute δ(µ, x) := − ∆xT∇2 η(µ, x)∆x using (3.1.20) µ xxe while δ > β do x := x + θ∆x for k = 1, 2,...,K do solve (3.1.9) to obtain (y(k), s(k), ν(k)) (k) (k) (k) (k) choose a scaling element p ∈ C(s , ν ) and compute (se , νe ) end for 2 −1 compute ∆x := −{∇xxηe(µ, x)} ∇xηe(µ, x) using (3.1.19) and (3.1.20) r 1 compute δ(µ, x) := − ∆xT∇2 η(µ, x)∆x using (3.1.20) µ xxe end while µ := γµ (k) (k) apply inverse scaling to (se , νe ) end while

0 0 Our algorithm is initialized with a starting point x ∈ F0 and a starting value µ > 0 for the barrier parameter µ, and is indexed by a parameter γ ∈ (0, 1). We use δ as a measure of the proximity of the current point x to the central path, and β as a threshold

64 for that measure. If the current x is too far away from the central path in the sense that δ > β, Newton’s method is applied to find a point close to the central path. Then the value of µ is reduced by a factor γ and the whole precess is repeated until the value of µ is within the tolerance . By tracing the central path as µ approaches zero, a strictly feasible -optimal solution to (3.1.11) will be generated.

3.4 Complexity analysis

In this section we present the complexity analysis for two variants of algorithms: short- step algorithms and long-step algorithms, which are controlled by the input value of γ in Algorithm 1. As mentioned in [25], the first part of the following proposition follows directly from the definition of self concordance and is due to [30, Theorem 2.1.1]. The second part is results from the first part and is given in [48] without proof.

2 −1 Proposition 3.4.1. For any µ > 0 and x ∈ F0, we denote ∆x := −{∇xxηe(µ, x)} ∇xηe(µ, x) q and δ := − 1 ∆xT∇2 η(µ, x)∆x. Then for δ < 1, τ ∈ [0, 1] and any h, h , h ∈ m2 we µ xxe 1 2 R have

2 T 2 T 2 −2 T 2 (i) −(1−τδ) h ∇xxηe(µ, x)h ≤ −h ∇xxηe(µ, x+τ∆x)h ≤ −(1−τδ) h ∇xxηe(µ, x)h,

T 2 2 −2 p T 2 (ii) |h1 (∇xxηe(µ, x + τ∆x) − ∇xxηe(µ, x))h2| ≤ [(1 − τδ) − 1] −h1 ∇xxηe(µ, x)h1 p T 2 −h2 ∇xxηe(µ, x)h2.

The following lemma is essentially Theorem 2.2.3 of [30] and describes the behavior

of the Newton method as applied to ηe(µ, ·).

Lemma 3.4.1. For any µ > 0 and x ∈ F0, let ∆x be the Newton direction defined by q ∆x := −{∇2 η(µ, x)}−1∇ η(µ, x); δ := δ(µ, x) = − 1 ∆xT∇2 η(µ, x)∆x, x+ = x + xxe xe µ xxe q ∆x, ∆x+ be the Newton direction calculated at x+, and δ(µ, x+) := − 1 ∆x+T∇2 η(µ, x+)∆x+. µ xxe Then the following relations hold:

65 √  δ 2 (i) If δ < 2 − 3, then δ(µ, x+) ≤ ≤ δ . 1 − δ 2 √ ¯ ¯ −1 (ii) If δ ≥ 2 − 3, then ηe(µ, x) − ηe(µ, x + θ∆x) ≥ µ(δ − ln(1 + δ)), where θ = (1 + δ) .

3.4.1 Complexity for short-step algorithm

In the short-step version of the algorithm, we decrease the barrier parameter by a factor √ th γ := 1 − σ/ r1 + Kr2, with σ < 0.1, in each iteration. The k iteration of the short- step algorithm is performed as follows: at the beginning of the iteration, we have µ(k−1) and x(k−1) on hand and x(k−1) is close to the central path, i.e., δ(µ(k−1), x(k−1)) ≤ β. After the barrier parameter µ is reduced from µ(k−1) to µk := γµ(k−1), we have that δ(µk, x(k−1)) ≤ 2β. Then a full Newton step with size θ = 1 is taken to produce a new point xk with δ(µk, xk) ≤ β. We now show that, in this class of algorithms, only one Newton step is sufficient for recentering after updating the parameter µ. For the purpose of proving this result, we present the following proposition which is a restatement of [30, Theorem 3.1.1].

 √  Proposition 3.4.2. Let χ (η; µ, µ+) := 1+r2 + r1+Kr2 lnγ−1. Assume that δ(µ, x) < κ e 2 κ κ and µ+ := γµ satisfies δ(µ, x) χ (η; µ, µ+) ≤ 1 − . κ e κ

Then δ(µ+, x) < κ. √ + Lemma 3.4.2. Let µ = γµ, where γ = 1 − σ/ r1 + Kr2 and σ ≤ 0.1, and let β = √ (2 − 3)/2. If δ(µ, x) ≤ β, then δ(µ+, x) ≤ 2β. √ Proof. Let κ := 2β = 2 − 3. Since δ(µ, y) ≤ κ/2, one can verify that for σ ≤ 0.1, µ+ satisfies 1 δ(µ, y) χ (η; µ, µ+) ≤ ≤ 1 − . κ e 2 κ

By Proposition 3.4.2, we have δ(µ+, y) ≤ κ. 2

66 By Lemmas 3.4.1(i) and 3.4.2, we conclude that we can reduce the parameter µ by the √ factor γ := 1 − σ/ r1 + Kr2, σ < 0.1, at each iteration, and that only one Newton step is sufficient to restore proximity to the central path. So we have the following complexity result for short-step algorithms.

Theorem 3.4.1. Consider Algorithm 1 and let µ0 be the initial barrier parameter,  > √ 0 the stopping criterion, and β = (2 − 3)/2. If the starting point x0 is sufficiently close to the central path, i.e., δ(µ0, x0) ≤ β, then the short-step algorithm reduces the √ 0 barrier parameter µ at a linear rate and terminates with at most O( r1 + Kr2 ln(µ /)) iterations.

3.4.2 Complexity for long-step algorithm

In the long-step version of the algorithm, the barrier parameter is decreased by an arbi- trary constant factor γ ∈ (0, 1). It has a potential for larger decrease on the objective function value, however, several damped Newton steps might be needed for restoring the proximity to the central path. The kth iteration of the long-step algorithms is performed as follows: at the beginning of the iteration we have a point x(k−1), which is sufficiently close to x(µ(k−1)), where x(µ(k−1)) is the solution to (3.1.11) for µ := µ(k−1). The barrier parameter is reduced from µ(k−1) to µk := γµ(k−1), where γ ∈ (0, 1), and then a search is started to find a point xk that is sufficiently close to x(µk). The long-step algorithm generates a finite sequence

k consisting of N points in F0, and we finally take x to be equal to the last point of this sequence. We want to determine an upper bound on N, the number of Newton iterations that are needed to find the point xk.

For µ > 0 and x ∈ F0, we define the function φ(µ, x) := ηe(µ, x(µ)) − ηe(µ, x), which k (k) th represents the difference between the objective value ηe(µ , x ) at the end of k iteration k k−1 th and the minimum objective value ηe(µ , x(µ )) at the beginning of k iteration. Then

67 our task is to find an upper bound on φ(µ+, x). To do so, we first give upper bounds on φ(µ, x) and φ0(µ, x) respectively. We have the following lemma.

˜ Lemma 3.4.3. Let µ > 0 and x ∈ F0, we denote ∆x := x(µ) − x and define

r 1 δ˜ := δ˜(µ, x) = − ∆˜ xT∇2 η(µ, x)∆˜ x. µ xxe

˜ For any µ > 0 and x ∈ F0, if δ < 1, then the following inequalities hold:

! δ˜ φ(µ, x) ≤ µ + ln(1 − δ˜) , (3.4.1) 1 − δ˜

√ 0 ˜ |φ (µ, x)| ≤ − r1 + Kr2 ln(1 − δ). (3.4.2)

Proof. Z 1 ˜ T ˜ φ(µ, x) := ηe(µ, x(µ)) − ηe(µ, x) := ∇xηe(µ, x + τ∆x) ∆xdτ. 0

Since x(µ) is the optimal solution, we have

(3.4.3) ∇xηe(µ, x(µ)) = 0

Hence, with the aid of Proposition 3.4.1(i), we get

Z 1 Z τ ˜ T 2 ˜ ˜ φ(µ, x) = ∆x ∇xxηe(µ, x + α∆x)∆x dαdτ 0 1 Z 1 Z 1 ∆˜ xT∇2 η(µ, x)∆˜ x ≤ − xxe dαdτ 2 0 τ (1 − αδˆ) Z 1 Z 1 µδ˜2 = dαdτ 2 0 τ (1 − αδ˜) ! δ˜ = µ + ln(1 − δ˜) , 1 − δ˜ which establishes (3.4.1). Now, for any µ > 0, by applying the chain rule, using (3.4.3), and applying the

68 Mean-Value Theorem, we get

0 0 0 T 0 φ (µ, x) = η (µ, x(µ)) − η (µ, x) + ∇xη(µ, x(µ)) x (µ) e e e (3.4.4) 0 0 ˜ T ˜ = ηe (µ, x(µ)) − ηe (µ, x) = ∇xηe(µ, x + $∆x) ∆x, for some $ ∈ (0, 1). Hence

Z 1 0 0 ˜ T ˜ |φ (µ, x)| = ∇xηe (µ, x + τ∆x) ∆x dτ 0 Z 1 q ˜ T 2 ˜ ˜ ≤ −∆x ∇xxηe(µ, x + τ∆x)∆x 0 q 0 ˜ T 2 ˜ −1 0 ˜ −∇xηe (µ, x + τ∆x) [∇xxηe(µ, x + τ∆x)] ∇xηe (µ, x + τ∆x) dτ.

Then, by using (3.2.5) and Proposition 3.4.1(i), we obtain

q s Z 1 −∆˜ xT∇2 η(µ, x)∆˜ x xxe r1 + Kr2 |φ0(µ, x)| ≤ dτ 0 1 − τδ˜ µ √ s Z 1 δ˜ µ r + Kr = 1 2 dτ 0 1 − τδ˜ µ √ Z 1 δ˜ = r1 + Kr2 dτ 0 1 − τδ˜ √ ˜ = − r1 + Kr2 ln(1 − δ),

which establishes (3.4.2). 2

˜ ˜ Lemma 3.4.4. Let µ > 0 and x ∈ F0 be such that δ < 1, where δ is as defined in Lemma 3.4.3. Let µ+ := γµ with γ ∈ (0, 1). Then

+ + + + ηe(µ , x(µ )) − ηe(µ , x) ≤ O(r1 + Kr2)µ .

Proof. Differentiating (3.4.4) with respect to µ, we get

00 00 00 0 T 0 (3.4.5) φ (µ, x) = ηe (µ, x(µ)) − ηe (µ, x) + ∇xη (µ, x(µ)) x (µ).

69 We will work on the right-hand side of (3.4.5) by verifying that the second term

00 ηe (µ, x) is nonnegative and then bounding the first and the last term. 00 PK 00 From the definition of ηe(x, µ), we have that ηe (µ, x) = k=1 ρk(µ, x). Differentiating

ρk(µ, x) with respect to µ and using (3.2.4) and (3.1.10) gives

0 (k)T (k)0 (k) (k)−1 (k)0 ρk(µ, x) = d y + ln det se + µ se • se (k) (k)T (k)−1 (k)T (k)−1 (k)−1 (k) (k)−1 (k)T (k)−1 = ln det se + d (−R Wf se ) + µ se • (Wf R Wf se ) (k) (k)T (k)−1 (k)T (k)−1 (k) (k) (k)−1 (k)T (k)−1 = ln det se + d (−R Wf se ) + ν • (Wf R Wf se ) (k) (k) (k)T (k) T (k)−1 (k)T (k)−1 = ln det se + (−d + Wf ν ) (R Wf se ) (k) = ln det se .

Observe that (I − P (k))2 = I − P (k) (remember P (k)2 = P (k)). Therefore, by differen-

0 tiating ρk(µ, x) with respect to µ and using (3.2.4) and (3.1.18), we obtain

00 (k)−1 (k)0 ρk(µ, x) = se • se (k)−1 (k) (k)−1 (k)T (k)−1 = se • (Wf R Wf se ) (k)−1 l (k)1/2 m (k) l (k)1/2 m (k)−1  = se • se I − P se se l m 2 (k) (k)1/2 (k)−1 = I − P se se ≥ 0.

00 Thus, for µ > 0 and x ∈ F0, ηe (µ, x) ≥ 0, and hence ηe(µ, x) is a convex function of µ. In addition, using (3.1.18), we also have

00 (k)−1  (k) (k)−1 l (k)1/2 m (k) l (k)1/2 m (k)−1  ρk(µ, x) = se • se se − se P se se (k)−1  (k) (k)−1 (k)−1 l (k)1/2 m (k) l (k)1/2 m (k)−1 = se • se se − se • se P se se l m 2 (k)−1  (k) (k)−1 (k) (k)1/2 (k)−1 = se • se se − P se se (k)−1  (k) (k)−1 ≤ se • se se (k)−1 (k) = se • se

70 −1 (k) (k) = µ νe • se −1 = µ trace(e2) r = 2 . µ Hence Kr η00(µ, x(µ)) ≤ 2 . (3.4.6) e µ

By differentiating (3.4.3) with respect to µ, we get

0 2 0 ∇xηe (µ, x(µ)) + ∇xxηe(µ, x(µ))x (µ) = 0,

or equivalently

0 2 −1 0 x (µ) = −{∇xxηe(µ, x(µ))} ∇xηe (µ, x(µ)).

Hence, by using (3.2.5), we have

0 T 0 0 T 2 −1 0 −∇xηe (µ, x(µ)) x (µ) = ∇xηe (µ, x(µ)) {∇xxηe(µ, x(µ))} ∇xηe (µ, x(µ)) 1 (3.4.7) ≤ (r + Kr ). µ 1 2

00 By combining (3.4.6) and (3.4.7), and using the fact that ηe (µ, x) ≥ 0, we obtain

r + 2Kr φ00(µ, x(µ)) ≤ 1 2 . (3.4.8) µ

Applying the Mean-Value Theorem and using Lemma 3.4.3 and (3.4.8) gives

Z µ+ Z τ φ(µ+, x) = φ(µ, x) + φ0(µ, x)(µ+ − µ) + φ00(υ, x) dυdτ ! µ µ ˜ √ δ ˜ ˜ + ≤ µ + ln(1 − δ) − r1 + Kr2 ln(1 − δ)(µ − µ ) 1 − δ˜ Z µ+ Z τ −1 +(r1 + 2Kr2) υ dυdτ µ µ

71 ! ˜ √ δ ˜ ˜ + = µ + ln(1 − δ) − r1 + Kr2 ln(1 − δ)(µ − µ ) 1 − δ˜

+ τ +(r1 + 2Kr2)(µ − µ ) ln µ ! ˜ √ δ ˜ ˜ + ≤ µ + ln(1 − δ) − r1 + Kr2 ln(1 − δ)(µ − µ ) 1 − δ˜

+ −1 +(r1 + 2Kr2)(µ − µ ) lnγ

(recall γ−1 = µ+/µ ≥ τ/µ). Since δ˜ and γ are constants, the lemma is established. 2

Notice that the previous lemma requires δ˜ < 1. However, evaluating δ˜ explicitly may not be possible. In the next lemma we will see that δ˜ is actually proportional to δ, which can be evaluated.

2 −1 Lemma 3.4.5. For any µ > 0 and x ∈ F0, let ∆x := −{∇xxηe(µ, x)} ∇xηe(µ, x) and ∆˜ x := x − x(µ). We denote

r 1 r 1 δ := δ(µ, x) = − ∆xT∇2 η(µ, x)∆x and δ˜ := δ˜(µ, x) = − ∆˜ xT∇2 η(µ, x)∆˜ x. µ xxe µ xxe

2 ˜ If δ < 1/6, then 3 δ ≤ δ ≤ 2δ.

2 ˜ Proof. Let H := ∇xxηe(µ, x) and g := ∇xηe(µ, x), and denote g¯ := g + H∆x. From (3.4.3), we have ∆x = −H−1g. This gives us ∆˜ x = ∆x + H−1g¯. We then have

r 1 δ˜ = − (∆x + H−1g¯)T∇2 η(µ, x)(∆x + H−1g¯) µ xxe v u 1 1 u T 2 −1 T 2 −1 ≤ t− ∆x ∇xxηe(µ, x)∆x − (H g¯) ∇xxηe(µ, x)(H g¯) (3.4.9) µ µ | {z } H r 1 ≤ δ + − g¯TH−1g¯, µ

where we used the triangle inequality to obtain the last inequality. Note that, by (3.4.3),

72 ˜ we have ∇xηe(µ, x − ∆x) = 0. Applying the Mean-Value Theorem gives

−hTg¯ = −hT(H∆˜ x + g)

T 2 ˜ = −h (∇xxηe(µ, x)∆x + ∇xηe(µ, x)) T 2 ˜ ˜ = −h (∇xxηe(µ, x)∆x − (∇xηe(µ, x − ∆x) − ∇xηe(µ, x))) T 2 2 ˜ ˜ = −h (∇xxηe(µ, x) − ∇xxηe(µ, x − (1 − $)∆x))∆x, for some $ ∈ (0, 1).

Now in view of Proposition 3.4.1(ii) we have

Z 1 T T 2 2 ˜ ˜ −h g¯ = − h (∇xxηe(µ, x) − ∇xxηe(µ, x − (1 − τ)∆x))∆x dτ 0 p √ Z 1 ≤ −∆˜ xTH∆˜x −hTHh ((1 − (1 − τ)δ˜)−2 − 1)dτ ! 0 δ˜ p √ = −∆˜ xTH∆˜x −hTHh 1 − δ˜ √ ! µ δ˜2 √ = −hTHh. 1 − δ˜

It can be verified that −g¯TH−1g¯ = max{hTHh − 2hTg¯ : h ∈ Rm1 }. It then follows that

( √ ! ) 2 µ δ˜2 √ µ δ˜4 −g¯TH−1g¯ ≤ max hTHh + −hTHh : h ∈ Rm1 = . (3.4.10) 1 − δ˜ (1 − δ˜)2

From (3.4.9) and (3.4.10), we obtain δ˜ ≤ δ+ δ˜2 , or equivalently, 2δ˜2−(1+δ) δ˜+δ ≥ 0. 1−δ˜ Therefore, the condition δ ≤ 1/6 implies that 12δ˜2 − 7 δ˜+ 1 ≥ 0, which in turn gives that δ˜ ≤ 1/3 = 2δ. From (3.4.9), by exchanging positions of ∆x and ∆˜ x and following the above steps, we get δ˜2 δ˜ δ˜ 3 δ ≤ δ˜ + , or equivalently, δ ≤ ≤ = δ.˜ ˜ ˜ 1 1 − δ 1 − δ 1 − 3 2 2 ˜ Thus, the condition δ < 1/6 implies that 3 δ ≤ δ ≤ 2δ. This completes the proof. 2

73 Combining Lemmas 3.4.1(ii), 3.4.4, and 3.4.5, we have the following complexity result for long-step algorithms.

Theorem 3.4.2. Consider Algorithm 1 and let µ0 be the initial barrier parameter,  > 0 the stopping criterion, and β = 1/6. If the starting point x0 is sufficiently close to the central path, i.e., δ(µ0, x0) ≤ β, then the long-step algorithm reduces the barrier parameter

0 µ at a linear rate and terminates with at most O((r1 + Kr2) ln(µ /)) iterations.

74 Chapter 4

A Class of Polynomial Volumetric Barrier Decomposition Algorithms for Stochastic Symmetric Programming

Ariyawansa and Zhu [10] have derived a class of polynomial time decomposition-based algorithms for solving the SSDP problem (Problem 4) based on a volumetric barrier analogous to work of Mehrotra and Ozevin¨ [25] by utilizing the work of Anstreicher [7] for DSDP. In this chapter, we extend Ariyawansa and Zhu’s work [10] to the case of SSPs by deriving a class of volumetric barrier decomposition algorithms for the general SSP problem and establishing polynomial complexity of certain members of the class of algorithms. The results of this chapter have been submitted for publication [4].

75 4.1 The volumetric barrier problem for SSPs

In this section we formulate appropriate volumetric barrier function for the SSP problem (with finite event space Ω) and obtain expressions for the derivatives required in the rest of the paper. Our procedure closely follows that in §3 of [10], although our setting is much more general.

4.1.1 Formulation and assumptions

We now examine (2.1.3, 2.1.4) when the event space Ω is finite. Let {(T (k),W (k), h(k), d(k)): k = 1, 2,...,K} be the set of the possible values of the random variables T (ω),W (ω), h(ω),

  (k) (k) (k) (k) d(ω) and let pk := P T (ω),W (ω), h(ω), d(ω) = T ,W , h , d be the associated probability for k = 1, 2,...,K. Then Problem (2.1.3, 2.1.4) becomes

K T X (k) max c x + pkQ (x) k=1 (4.1.1) s.t. Ax  b, KJ1 where, for k = 1, 2,...,K,Q(k)(x) is the maximum of the problem

max d(k)Ty(k) (4.1.2) s.t. W (k)y(k)  h(k) − T (k)x, KJ2 where x ∈ Rm1 is the first-stage decision variable, and y(k) ∈ Rm2 is the second-stage variable for k = 1, 2,...,K. We notice that the constraints in (4.1.1, 4.1.2) are negative symmetric while the com- mon practice in the DSP literature is to use positive symmetric constraints. So for conve-

(k) (k) (k) nience we redefine d as d := pkd for k = 1, 2,...,K, and rewrite Problem (4.1.1,

76 4.1.2) as K X min cTx + Q(k)(x) k=1 (4.1.3) s.t. Ax − b  0, KJ1 where, for k = 1, 2,...,K,Q(k)(x) is the maximum of the problem

min d(k)Ty(k) (4.1.4) s.t. W (k)y(k) + T (k)x − h(k)  0. KJ2

In the rest of this paper our attention will be on Problem (4.1.3, 4.1.4), and from now on when we use the acronym “SSP” in this paper we mean Problem (4.1.3, 4.1.4).

4.1.2 The volumetric barrier problem for SSPs

In this section we formulate a volumetric barrier for SSPs and obtain expressions for the derivatives required in our subsequent development. In order to define the volumetric barrier problem for the SSP (4.1.3, 4.1.4), we need to make some assumptions. First we define

F := x : s (x) := Ax − b 0 ; 1 1 KJ1 F (k)(x) := y(k) : s(k)(x, y(k)) := W (k)y(k) + T (k)x − h(k) 0 for k = 1, 2,...,K; 2 KJ2  (k) F2 := x: F (x) 6= ∅, k = 1, 2,...,K ; T F0 := F1 F2.

Now we make

Assumption 4.1.1. The matrix A and every matrix T (k) have full column rank.

Assumption 4.1.2. The set F0 is nonempty.

Assumption 4.1.3. For each x ∈ F0 and for k = 1, 2,...,K, Problem (4.1.4) has a nonempty isolated compact set of minimizers.

77 Assumption 4.1.1 is for convenience. Under Assumption 4.1.2, the set F1 is nonempty.

The logarithmic barrier [30] for F1 is the function l1 : F1 → R defined by

l1(x) := − ln det(s1(x)), ∀ x ∈ F1,

and the volumetric barrier [30, 40] for F1 is the function v1 : F1 → R defined by

1 v (x) := ln det(∇2 l (x)), ∀ x ∈ F . 1 2 xx 1 1

(k) Also under Assumption 4.1.2, F2 is nonempty and for x ∈ F2, F (x) is nonempty for k =

(k) (k) (k) 1, 2,...,K. The logarithmic barrier [30] for F (x) is the function l2 : F2 ×F (x) → R defined by

(k) (k) (k) (k) (k) (k) l2 (x, y ) := −ln det(s2 (x, y )), ∀ y ∈ F (x), x ∈ F2,

(k) (k) (k) and the volumetric barrier [30, 40] for F (x) is the function v2 : F2 × F (x) → R defined by

1 v(k)(x, y(k)) := ln det(∇2 l(k)(x, y(k))), ∀ y(k) ∈ F (k)(x), x ∈ F . 2 2 y(k)y(k) 2 2

Now, we define the volumetric barrier problem for the SSP (4.1.3, 4.1.4) as

K T X min η(µ, x) : = c x + ρk(µ, x) + µc1v1(x), (4.1.5) k=1

where for k = 1, 2,...,K and x ∈ F0, ρk(µ, x) is the minimum of the problem

(k)T (k) (k) (k) min d y + µc2v2 (x, y ). (4.1.6)

78 √ 3 Here c1 := 225 n1 and c2 := 450n2 are constants, and µ > 0 is the barrier parameter.

Now, we will show that (4.1.6) has a unique minimizer for each x ∈ F0 and for k = 1, 2,...,K. For this purpose, we present the following theorem.

Theorem 4.1.1 (Fiacco and Mccormick [19, Theorem 8]). Consider the inequality con- strained problem min f(x) (4.1.7) s.t. gi(x) ≥ 0, i = 1, 2, . . . , m,

n where the functions f, g1, . . . , gm : R → R are continuous. Let I be a scalar-valued function of x with the following two properties: I(x) is continuous in the region R0 :=

{x : gi(x) > 0, i = 1, 2, . . . , m}, which is assumed to be nonempty; if {xk} is any infinite

0 sequence of points in R converging to xB such that gi(xB) = 0 for at least one i, then

limk→∞ I(xk) = +∞. Let τ be a scalar-valued function of the single variable s with the

following two properties: if s1 > s2 > 0, then τ(s1) > τ(s2) > 0; if {sk} is an infinite

0 + sequence of points such that limk→∞ sk = 0, then limk→∞ τ(sk) = 0. Let U : R ×R → R be defined by U(x, s) := f(x)+τ(s)I(x). If (4.1.7) has a nonempty, isolated compact set of

local minimizers and {sk} is a strictly decreasing infinite sequence, then the unconstrained

local minimizers of U(·, sk) exist for sk small.

Lemma 4.1.1. If Assumptions 4.1.2 and 4.1.3 hold, then for each x ∈ F0 and k = 1, 2,...,K, the Problem (4.1.6) has a unique minimizer for µ small.

(k) (k) (k) Proof. For any x ∈ F0, v2 (x, y ) is defined on the nonempty set F (x). By Theorem (k) (k) (k) (k) (k) (k) 1.3.1, there exist real numbers λ1 , λ2 , ··· , λr2 and a Jordan frame c1 , c2 , ··· , cr2

(k) (k) (k) (k) (k) (k) (k) (k) (k) (k) (k) such that s2 (x, y ) = λ1 c1 + λ2 c2 + ··· + λr2 cr2 , and λ1 , λ2 , ··· , λr2 are the

(k) (k) (k) (k) (k) eigenvalues of s2 (x, y ). Moreover, λj can be viewed as a function of y ∈ F (x) (k) for j = 1, 2, . . . , r2. Then λj is continuous for j = 1, 2, . . . , r2 and hence the constraint s(k)(x, y(k)) 0 can be replaced by the constraints: λ(k)(y(k)) > 0, j = 1, 2, . . . , r . 2 KJ2 j 2 So (4.1.4) can be rewritten in the form of (4.1.7). Therefore, by Theorem 4.1.1, local

79 minimizers of (4.1.6) exist for each x ∈ F0 and k = 1, 2,...K for µ small. The uniqueness

(k) of the minimizer follows from the fact that v2 is strictly convex. 2

In view of Lemma 4.1.1, Problem (4.1.5) is well-defined, and its feasible set is F0.

2 4.1.3 Computation of ∇xη(µ, x) and ∇xxη(µ, x)

In order to compute the derivatives of η we need to determine the derivatives of ρk, k =

(k) (k) 1, 2,...,K, which in turn require the derivatives of v2 and l2 for k = 1, 2,...,K. We will drop the superscript (k) when it does not lead to confusion. Note that

∇xs1(x) = A, ∇xs2(x, y) = T, and ∇ys2(x, y) = W.

Hence

T −1 T −1 T −1 T −1 ∇xl1(x) = −(∇xs1) s1 = −A s1 , and ∇yl2(x, y) = −(∇ys2) s2 = −W s2 .

This implies that

2 T −1 T −1 ∇xxl1(x) = A ds1 e∇xs1 = A ds1 eA, and

2 T −1 T −1 H := ∇yyl2(x, y) = W ds2 e∇xs2 = W ds2 eW.

We need the matrix calculus result for our computation.

Proposition 4.1.1. Let X ∈ Rn×n be nonsingular . Then

∂ −1 −1 T −1 X = −X eiej X , ∂Xij

th n for i, j = 1, 2, . . . , n. Here, ei is the i vector in the standard basis for R .

80 To compute the first partial derivatives of v2(x, y), we start by observing that

∂ X ∂ ∂ s−1 = s−1 ds e ∂x 2 ∂ ds e 2 ∂x 2 ij k ij 2 ij k X ∂ −1 ∂ = ds e ds e ∂ ds e 2 ∂x 2 ij ij 2 ij k   X −1 −1 ∂ = −2 ds e e eT ds e s , s (4.1.8) 2 i j 2 2 ∂x 2 ij k ij X −1 T −1 = −2 ds2e eiej ds2e ds2, tkeij ij  −1  −1 = −2 s2 ds2, tke s2 .

Then, for i = 1, 2, . . . , m1, we have

∂ 1 ∂ v2(x, y) = 2 ln det H ∂xi ∂xi 1 −1 ∂ = 2 H • H ∂xi ∂ = 1 H−1 • (W Tds−1eW ) 2 ∂x 2 i   1 −1 T ∂ −1 = 2 H • W ds2 e W ∂xi −1 T −1 −1 = −H • W ds2e ds2, tie ds2e W

T −1 −1 −1 = −W H W • ds2e ds2, tie ds2e

l −1/2m T −1 l −1/2m l −1/2m l −1/2m = − s2 W H W s2 • s2 ds2, tie s2

By defining

l −1/2m T −1 l −1/2m P := s2 W H W s2 ,

l −1/2m which acts as the orthogonal projection onto the range of s2 W , we get

∂ T −1  −1  −1 l −1/2m l −1/2m v2(x, y) = −W H W • s2 ds2, tie s2 = −P • s2 ds2, tie s2 , ∂xi (4.1.9) for i = 1, 2, . . . , m1.

81 Similarly, we have

∂ T −1  −1  −1 l −1/2m l −1/2m v2(x, y) = −W H W • s2 ds2, wie s2 = −P • s2 ds2, wie s2 , ∂yi

for i = 1, 2, . . . , m2.

To compute the second partial derivatives of v2(x, y), we start by observing that

∂ X ∂ ∂ H−1 = H−1 H ∂x ∂H ∂x ij k ij ij k X −1 T −1 ∂ h T  −1 i = − (H eiej H ) W s2 W ∂x ij ij k h  ∂  i X −1 T −1 T  −1 (4.1.10) = − (H eiej H ) W s2 W ∂x ij ij k X −1 T −1 h T −1 −1 i = 2 (H eiej H ) W ds2e ds2, tke ds2e W ij ij −1 T  −1  −1 −1 = 2 H W s2 ds2, tke s2 WH ,

and, using (4.1.8) and Lemma 1.3.2, that

∂  ∂   ∂  s−1 ds , t e s−1 = s−1 ds , t e s−1 + s−1 ds , t e s−1 ∂x 2 2 i 2 ∂x 2 2 i 2 2 ∂x 2 i 2 j j   j  −1 ∂  −1 + s2 ds2, tie s2 ∂xj  −1  −1  −1  −1  −1 = −2 s2 ds2, tje s2 ds2, tie s2 + s2 dtj, tie s2  −1  −1  −1 −2 s2 ds2, tie s2 ds2, tje s2  −1  −1  −1 = −2 s2 ds2, tje s2 ds2, tie s2  −1   −1   −1 + s2 dtj, tie − 2 ds2, tie s2 ds2, tje s2 . (4.1.11) By combining (4.1.9), (4.1.10) and (4.1.11), we get

∂2 ∇2 v (y, x) = v (x, y) = 2Qxx + Rxx − 2T xx, (4.1.12) xx 2 ∂x∂x 2

82 where

xx −1 T  −1  −1  −1 Qi,j = (WH W ) • s2 ds2, tje s2 ds2, tie s2 ,

xx −1 T  −1   −1   −1 Ri,j = (WH W ) • s2 2 ds2, tie s2 ds2, tje − dtj, tie s2 ,

xx −1 T  −1  −1 −1  −1  −1 Ti,j = (WH W ) • s2 ds2, tje s2 WH W s2 ds2, tie s2 .

By following similar steps as above, we have that

∂2 ∇2 v (y, x) = v (x, y) = 2Qyy + Ryy − 2T yy, yy 2 ∂y∂y 2

where

yy −1 T  −1  −1  −1 Qi,j = (WH W ) • s2 ds2, wje s2 ds2, wie s2 ,

yy −1 T  −1   −1   −1 Ri,j = (WH W ) • s2 2 ds2, wie s2 ds2, wje − dwj, wie s2 ,

yy −1 T  −1  −1 −1  −1  −1 Ti,j = (WH W ) • s2 ds2, wje s2 WH W s2 ds2, wie s2 ;

∂2 ∇2 v (x, y) = v (x, y) = 2Qxy + Rxy − 2T xy, xy 2 ∂y∂x 2

where

xy −1 T  −1  −1  −1 Qi,j = (WH W ) • s2 ds2, wje s2 ds2, tie s2 ,

xy −1 T  −1   −1   −1 Ri,j = (WH W ) • s2 2 ds2, tie s2 ds2, wje − dwj, tie s2 ,

xy −1 T  −1  −1 −1  −1  −1 Ti,j = (WH W ) • s2 ds2, wje s2 WH W s2 ds2, tie s2 ;

and ∂2 ∇2 v (y, x) = v (x, y) = 2Qyx + Ryx − 2T yx, yx 2 ∂x∂y 2

where

yy −1 T  −1  −1  −1 Qi,j = (WH W ) • s2 ds2, tje s2 ds2, wie s2 ,

83 yy −1 T  −1   −1   −1 Ri,j = (WH W ) • s2 2 ds2, wie s2 ds2, tje − dtj, wie s2 ,

yx −1 T  −1  −1 −1  −1  −1 Ti,j = (WH W ) • s2 ds2, tje s2 WH W s2 ds2, wie s2 .

+ (k) Now, define ϕk : R × F0 × F (x) −→ R by

T ϕk(µ, x, y) := d y + µc2v2(x, y).

By (4.1.6) we then have

ρk(µ, x) = min ϕk(µ, x, y) y∈F (k)(x) and

ρk(µ, x) = ϕk(µ, x, y)|y=y¯ = ϕk(µ, x, y¯),

where y¯ is the minimizer of (4.1.6). Observe that y¯ is a function of x and is defined by

∇yϕk(µ, x, y)|y=y¯ = 0. (4.1.13)

1 Note that, by (4.1.13), we have ∇y¯v2(x, y¯) = ∇yv2(x, y)|y=y¯ = − d. This implies that µc2

2 3 2 ∇yx¯ v2(x, y¯) = ∇x∇y¯v2(x, y¯) = 0 and ∇yxx¯ v2(x, y¯) = ∇xx∇y¯v2(x, y¯) = 0.

Now we are ready to calculate the first and second order derivatives of ρk with respect to x. We have

∇xρk(µ, x) = [∇xϕk(µ, x, y) + ∇yϕk(µ, x, y) ∇xy]|y=y¯

= ∇xϕk(µ, x, y)|y=y¯ + ∇yϕk(µ, x, y)|y=y¯ ∇xy|y=y¯

= ∇xϕk(µ, x, y)|y=y¯

= µc2∇xv2(x, y)|y=y¯

84 = µc2∇xv2(x, y¯),

2 ∇xxρk(µ, x) = µ∇x{∇xv2(x, y¯)}

2 2 = µc2{∇xxv2(x, y¯) + ∇yx¯ v2(x, y¯)[∇xy]|y=y¯}

2 = µc2∇xxv2(x, y¯),

3 2 ∇xxxρk(µ, x) = µ∇x{∇xxv2(x, y¯}

3 3 = µc2{∇xxxv2(x, y¯) + ∇yxx¯ v2(x, y¯)[∇xy]|y=y¯}

3 = µc2∇xxxv2(x, y¯).

In summary we have

(k) (k) ∇xρk(µ, x) = µc2∇xv2 (x, y¯ ),

2 2 (k) (k) ∇xxρk(µ, x) = µc2∇xxv2 (x, y¯ ), (4.1.14)

3 3 (k) (k) ∇xxxρk(µ, x) = µc2∇xxxv2 (x, y¯ ), and K X (k) (k) ∇xη(µ, x) = c + µc1∇xv1(x) + µc2∇xv2 (x, y¯ ), k=1 K (4.1.15) 2 2 X 2 (k) (k) ∇xxη(µ, x) = µc1∇xxv1(x) + µc2∇xxv2 (x, y¯ ), k=1

(k) (k) 2 (k) (k) 3 (k) (k) where ∇xv2 (x, y¯ ), ∇xxv2 (x, y¯ ), and ∇xxxv2 (x, y¯ ) are calculated in (4.1.9), (4.1.12), and (4.2.4) respectively.

4.2 Self-concordance properties of the volumetric bar-

rier recourse

In this section we prove that the recourse function with volumetric barrier is a strongly self-concordant function leading to a strongly self-concordant family with appropriate

85 parameters. Establishing this allows us to develop volumetric barrier polynomial time path following interior point methods for solving SSPs. We need the following proposition for proving some results.

Proposition 4.2.1. Let A, B, C ∈ Rn×n. Then

1. A, B  0 implies that A • B ≥ 0;

2. if A  0 and B  C, then A • B ≥ A • C.

4.2.1 Self-Concordance of η(µ, ·)

This subsection is devoted to show that η(µ, ·) is a strongly self-concordant barrier on F0

(see Definition 3.2.1). Throughout this subsection, with respect to h ∈ Rm1 , we define

m1 X ¯ l −1/2m l −1/2m b := b(h) := hiti and b := s2 ds2, be s2 e2. i=1

Our proof relies on the following lemmas.

Lemma 4.2.1. Let (x, y) be such that s (x, y) 0. Then we have 2 KJ2

xx 2 0  Q  ∇xxv2(x, y). (4.2.1)

Proof. Let h ∈ Rm1 , h 6= 0. We have

T xx X xx h Q h = Qij hihj i,j −1 T X   −1  −1  −1  = (WH W ) • s2 ds2, tje s2 ds2, tie s2 hihj i,j −1 T  −1  X  −1   −1 = (WH W ) • s2 ds2, hjtje s2 ds2, hitie s2 i,j −1 T  −1 h X i  −1 h X i  −1 = (WH W ) • s2 ds2, hjtje s2 ds2, hitie s2 j i −1 T  −1  −1  −1 = (WH W ) • s2 ds2, be s2 ds2, be s2

86 2 l −1/2m −1 T l −1/2m l −1/2m l −1/2m = s2 WH W s2 • s2 ds2, be s2 = P • b¯2 .

Similarly we have

T xx −1 T   −1  −1  −1  −1  −1  h R h = (WH W ) • 2 s2 ds2, be s2 ds2, be s2 − s2 dbe s2

l −1/2m −1 T l −1/2m = s2 WH W s2 2  l −1/2m l −1/2m l −1/2m l −1/2m  • 2 s2 ds2, be s2 − s2 dbe s2 = P • b¯ , and

T xx −1 T −1  −1 −1  −1  −1 h T h = (WH W ) • ds2e ds2, be s2 WH W s2 ds2, be s2 l −1/2m l −1/2m l −1/2m l −1/2m = P • s2 ds2, be s2 P s2 ds2, be s2 = P • b¯ P b¯ .

Using Proposition 4.2.1 and observing that P  0 and b¯2  0, we conclude that Qxx  0. Since P is a projection, we have that I − P  0 and therefore

1 b¯ P b¯  b¯2 = b¯2 + b¯ . (4.2.2) 2

This implies that 1 P • b¯ P b¯ ≤ P • b¯2 + b¯ , 2 1 which is exactly hTT xxh ≤ hT(Qxx + Rxx)h. Since h is arbitrary, we have shown that 2 1 T xx  (Qxx + Rxx), which together with Qxx  0 establishes the result. 2 2

87 Lemma 4.2.2. For any h ∈ m1 , and (x, y) be such that s (x, y) 0. Then R 2 KJ2

√ ¯ 3/2 T xx 1/2 kbk ≤ 2 r2 (h Q h) . (4.2.3)

Proof. By Theorem 1.3.1, there exist real numbers λ1, λ2, ··· , λr2 and a Jordan frame ¯ c1, c2, ··· , cr2 such that b = λ1c1 +λ2c2 +···+λr2 cr2 , and λ1, λ2, ··· , λr2 are the eigenval- ues of b¯. Without loss of generality (scaling h as needed, and re-ordering indices), we may

¯2 assume that 1 = |λ1| ≥ |λ2| ≥ ... ≥ λr2 . In view of Lemma 1.3.3, the matrix b has a

2 2 full set of orthonormal eigenvectors cij with corresponding eigenvalues (1/2)(λi + λj ), for

1 ≤ i ≤ j ≤ r2. It follows that

" r # 1 X2 hTQxxh = P (λ2 + λ2) c cT 2 i j ij ij i,j=1 r 1 X2 = (λ2 + λ2) cT P c . 2 i j ij ij i,j=1

Recall that P is a projection onto an m2-dimensional space. So, we can write P as

m2 X T P = ulul , l=1

where u1, u2,..., um2 are the orthonormal eigenvectors of P corresponding to the nonzero

eigenvalues of P . Consider uk for some k, we have

r X2 uk = αijcij, i,j=1

for some constants αij, for i, j = 1, 2, . . . , r2, and

r2 r2 r2 X X X 1 = kukk = αijcij ≤ kαijcijk = |αij|. i,j=1 i,j=1 i,j=1

88 This means that there exist ik, jk such that

1 |αikjk | ≥ 2 . r2

Thus m ! 1 X X2 hTQxxh = (λ2 + λ2)cT u uT c 2 i j ij l l ij i,j l=1 m 1 X X2 = (λ2 + λ2) cT u uTc 2 i j ij l l ij i,j l=1 m 1 X X2 = (λ2 + λ2) kuTc k2 2 i j l ij i,j l=1 1 X ≥ (λ2 + λ2)kuTc k2 2 i j k ij i,j 1 X = (λ2 + λ2)|α |2 2 i j ikjk i,j 1 X 1 ≥ (λ2 + λ2) 2 i j r4 i,j 2 1 X ≥ (λ2 + λ2) 2r4 i j 2 j 1 X ≥ λ2 2r4 i 2 j 1 X = kb¯k2 2r4 2 2 j 1 ¯ 2 = 3 kbk2. 2r2 The result is established. 2

We will next compute the third partial derivative of v2(x, y) with respect to x. To

start, let (x, y) be such that s (x, y) 0, and h ∈ n2 . We have 2 KJ2 R

  ∂ T xx ∂ −1 T  −1  −1  −1 h Q h = W H W • s2 ds2, be s2 ds2, be s2 ∂xi ∂xi −1 T ∂  −1  −1  −1 + (WH W ) • s2 ds2, be s2 ds2, be s2 ∂xj −1 T −1 −1 −1 T  −1 2  −1 = 2 WH W ds2e ds2, tie ds2e WH W • s2 ds2, be s2

−1 T ∂  −1  −1  −1 + (WH W ) • s2 ds2, be s2 ds2, be s2 , ∂xi

89 where

∂  −1  −1  −1  −1   −1  −1 s2 ds2, be s2 ds2, be s2 = −2 s2 ds2, tie s2 ds2, be s2 ds2, be ∂xi  −1  −1 + ds2, be s2 ds2, tie s2 ds2, be  −1  −1   −1 + ds2, be s2 ds2, be s2 ds2, tie s2  −1   −1 + s2 dti, be s2 ds2, be  −1   −1 + ds2, be s2 dti, be s2 .

We conclude that the first directional derivative of hTQxxh with respect to x, in the direction h, is given by

m X1 ∂ ∇ hTQxxh [h] = h hTQxxh x i ∂x i=1 i = 2P • b¯ P b¯2 − 3P • b¯3 − P b¯2, b¯ .

By following similar steps as above, we obtain

T xx ¯ ¯ ¯2 ¯ ∇xh R h [h] = 2P • b P b − 4P • b , b , T xx ¯ ¯ ¯ ¯ ¯2 ¯ ¯ ∇xh T h [h] = 4P • b P b P b − 4P • b P b − 2P • b P b .

By combining the previous results, we get

3 ¯ ¯2 ¯3 ¯2 ¯ ∇xxxv2(x, y)[h, h, h] = 12P • b P b − 6P • b − 6P • b , b (4.2.4) +6P • b¯ P b¯ − 8P • b¯ P b¯ P b¯ .

3 We need the following lemma which bounds ∇xxxv2(x, y)[h, h, h].

Lemma 4.2.3. For any h ∈ m1 and (x, y) be such that s (x, y) 0. Then R 2 KJ2

3 ¯ T xx |∇xxxv2(x, y)[h, h, h] | ≤ 30kbkh Q h. (4.2.5)

90 Proof. Note that 1 b¯2 b¯ = b¯3 + b¯2, b¯ . 2

Then (4.2.4) can be rewritten as

3 ¯ ¯2 ¯ ¯ ¯ ¯2 ¯ ∇xxxv2(x, y)[h, h, h] = P b P • 12 b + 6 b − 8 b P b − 12P • b b . (4.2.6) From (4.2.2) we have

12 b¯2 + 6 b¯ − 8 b¯ P b¯  8 b¯2 + 2 b¯ .

By observing that that − b¯2  b¯  b¯2, we obtain

6 b¯2  12 b¯2 + 6 b¯ − 8 b¯ P b¯  18 b¯2 . (4.2.7)

¯ Let λ1, λ2, . . . , λr2 be the eigenvalues of B. Then, for i, j = 1, 2, . . . , r2, the eigenvalues of ¯ b are of the form (1/2)(λi + λj) (see Lemma 1.3.3). We then have

−kb¯k I  b¯  kb¯k I and hence −kb¯kP  P b¯ P  kb¯kP. (4.2.8)

Using (4.2.7), (4.2.8), and the face that b¯2  0, we get

   2        2 P b¯ P • 12 b¯ + 6 b¯ − 8 b¯ P b¯ ≤ 18||b¯kP • b¯ . (4.2.9)

In addition, since the elements b¯2 and b¯ have the same eigenvectors, the matrices b¯2

91 and b¯ also have the same eigenvectors. This implies that

−kb¯k b¯2  b¯2 b¯  kb¯k b¯2 .

Thus we have that

 2    2 P • b¯ b¯ ≤ kb¯kP • b¯ . (4.2.10)

The result follows from (4.2.6), (4.2.9) and (4.2.10). 2 We can now state the proof of Theorem 4.2.1.

Theorem 4.2.1. For any fixed µ > 0, ρk(µ, ·) is µ-self-concordant on F0, for k = 1, 2,...,K.

Proof. By combining the results of (4.2.1), (4.2.3) and (4.2.5), we get

√ 3 3/2 T 2 3/2 ∇xxxv2(x, y¯)[h, h, h] ≤ 30 2n2 (h ∇xxv2(x, y¯)h) , which combined with (4.1.14) implies that

√ 3 3/2 2 3/2 |∇xxxρk(y)[h, h, h]| ≤ 30 2µc2r2 (∇xxv2(x, y¯)[h, h])

−1/2 2 3/2 = 2µ (c2µ∇xxv2(x, y)[h, h])

−1/2 2 3/2 = 2µ (∇xxρk(x)[h, h]) .

The theorem is established. 2

Corollary 4.2.1. For any fixed µ > 0, η(µ, ·) is a µ-self-concordant function on F0.

Proof. It is easy to verify that µc1v1 is µ-self-concordant on F1. The corollary follows from [30, Proposition 2.1.1]. 2

92 4.2.2 Parameters of the self-concordant family

In this subsection, we show that the family of functions {η(µ, ·): µ > 0} is a strongly self-concordant family with appropriate parameters (see definition 3.2.2). The proof of self-concordancy of the family {η(µ, ·): µ > 0} relies on the following two lemmas.

Lemma 4.2.4. For any µ > 0 and x ∈ F0, the following inequality holds:

1 |{∇2 η(µ, x)}0[h, h]| ≤ ∇2 η(µ, x)[h, h], ∀ h ∈ m1 . xx µ xx R

2 Proof. Differentiating ∇xxη(µ, x) in (4.1.15) with respect to µ, we obtain

K 2 0 2 X 2 (k) (k) 3 (k) (k) (k) 0 {∇xxη(µ, x)} = ∇xxv1(x) + {∇xxv2 (x, y¯ ) + µ∇xxy¯v2 (x, y¯ ) ◦ (y¯ ) } k=1 K X (k) 1 = ∇2 v (x) + ∇2 v (x, y¯(k)) = ∇2 η(µ, x). xx 1 xx 2 µ xx k=1

2 The result immediately follows by observing that ∇xxη(µ, x)  0, and therefore, for

n 1 2 any h ∈ R , we have that µ ∇xxη(µ, x)[h, h] ≥ 0. 2

For fixed (x, y¯) with s (x, y¯)  0, let 2 KJ2

¯ l −1/2m l −1/2m l −1/2m l −1/2m ti = s2 ds2, tie s2 e2 and w¯j = s2 ds2, wje s2 e2,

for i = 1, 2, . . . , m1 and j = 1, 2, . . . , m2. We can apply a Gram-Schmidt procedure to

{bw¯ic} and obtain {buic} with k buic k = 1 for all i and ui • uj = 0, i 6= j. Then the

linear span of {buic , i = 1, 2, . . . , m2} is equal to the span of {bw¯ic , i = 1, 2, . . . , m2}.

n2×m m2 T Let U¯ = [u¯ ; u¯ ; ... ; u¯ ] ∈ 2 2 and u = P u¯ . It follows that P = UU . We 1 2 m2 R b k=1 i

93 then have ∂v2(x, y¯) = −P • bt¯ic ∂xi T = −UU • bt¯ic

= −trace(U bt¯ic U) m X2 = − u¯k • (t¯i ◦ u¯k) k=1 (4.2.11) m2 X ¯ 2 = − ti • u¯k k=1 m2 ¯ X 2 = −ti • u¯k k=1 ¯ = −ti • ub, and xx ¯ ¯ Qi,j = P • btic btjc

= trace(U bt¯ic bt¯jc U) m X2 = (u¯k • (t¯i ◦ (t¯j ◦ u¯k))) k=1 m2 (4.2.12) X ¯ 2 ¯ = ti • (u¯k ◦ tj) k=1 m2 ! ! ¯ X 2 ¯ = ti • u¯k ◦ tj k=1 ¯ ¯ = ti • (ub ◦ tj). Lemma 4.2.5. Let (x, y¯) be such that s (x, y¯) 0. Then 2 KJ2

T 2 −1 ∇xv2(x, y¯) {∇xxv2(x, y¯)} ∇xv2(x, y¯) ≤ m2. (4.2.13)

2 ¯ ¯ ¯ ¯ n2×m1 Proof. Let T = [t1; t2; ... ; tm1 ] ∈ R . From (4.2.12) we have that

xx ¯ ¯ ¯ ¯ Qi,j = ti • (ub ◦ tj) = ti • dube tj,

94 and hence xx ¯T ¯ Q = T dube T.

From (4.2.11), we also have T ¯T ∇xv2(x, y¯) = −T ub.

Thus

T xx −1 ¯ ¯T ¯ −1 ¯T ∇xv2(x, y¯) (Q ) ∇xv2(x, y¯) = ub • T (T dube T ) T ub 1/2  1/2 ¯ ¯T ¯ −1 ¯T  1/2 1/2 = ub • ub T (T dube T ) T ub ub 1/2 1/2 ≤ ub • ub = trace(ub)

= m2,

Pm2 2 2 where the last equality follows from the fact that ub = k=1 uk, and that trace(uk) = uk • xx 2 2 −1 xx −1 uk = 1 for each k. In addition, Q  ∇xxv2(x, y¯) implies {∇xxv2(x, y¯)}  (Q ) . This completes the proof. 2

Lemma 4.2.6. For any µ > 0 and x ∈ F0, we have

s (m1c1 + m2c2)(1 + K) |∇ η0(µ, x)T[h]| ≤ ∇2 η(µ, x)[h, h], ∀ h ∈ m1 . x µ xx R

Proof. Differentiating ∇xη(µ, x) in (4.1.15) with respect to µ, we obtain

K 0 X (k) (k) 2 (k) (k) (k) 0 ∇xη (µ, x) = c1∇xv1(x) + c1{∇xv2 (x, y¯ ) + µ∇xy¯v2 (x, y¯ ) · (y¯ ) } k=1 K X (k) (k) = c1∇xv1(x) + c2∇xv2 (x, y¯ ) . k=1

95 In Lemma 4.2.13, we have shown that

(k) (k) T 2 (k) (k) −1 (k) (k) ∇xv2 (x, y¯ ) {∇xxv2 (x, y¯ )} ∇xv2 (x, y¯ ) ≤ m2, which is equivalent to

q (k) (k) 2 (k) (k) m2 (4.2.14) |∇xv2 (x, y¯ )[h]| ≤ m2∇xxv2 (x, y¯ )[h, h], ∀ h ∈ R .

Similarly, we can show that

T 2 −1 ∇xv1(x) {∇xxv1(x)} ∇xv1(x) ≤ m1, which is equivalent to

p 2 m1 (4.2.15) |∇xv1(x)[h]| ≤ m1∇xxv1(x)[h, h], ∀ h ∈ R .

Then, using (4.2.14) and (4.2.15), we have that for all h ∈ Rm1

K !

0 X (k) (k) |∇xη (µ, x)[h]| = c1∇xv1(x) + c2∇xv2 (x, y¯ ) [h] k=1 K X (k) (k) ≤ |c1∇xv1(x)[h]| + |c2∇xv2 (x, y¯ )[h]| k=1 q K q 2 2 X 2 2 (k) (k) ≤ m1c1∇xxv1(x)[h, h] + m2c2∇xxv2 (x, y¯ )[h, h] k=1 K q p 2 X 2 (k) (k) ≤ (m1c1)c1∇xxv1(x)[h, h] + (m2c2)c2∇xxv2 (x, y¯ )[h, h] v k=1 u K ! u 2 X 2 (k) (k) ≤ t(m1c1 + m2c2)(1 + K) c1∇xxv1(x)[h, h] + c2∇xxv2 (x, y¯ )[h, h] k=1 s (m c + m c )(1 + K) = 1 1 2 2 ∇2 η(µ, x)[h, h]. µ xx

96 The result is established. 2

Theorem 4.2.2. The family {η(µ, ·): µ > 0} is a strongly self-concordant family with the following parameters

p(1 + K)(m c + m c ) 1 α (µ) = µ, α (µ) = α (µ) = 1, α (µ) = 1 1 2 2 , α (µ) = . 1 2 3 4 µ 5 µ

Proof. It is direct to see that condition (i) of Definition 3.2.2 holds. Corollary 4.2.1 shows that condition (ii) is satisfied and Lemmas 4.2.4 and 4.2.6 show that condition (iii) is satisfied. 2

4.3 A class of volumetric barrier algorithms for solv-

ing SSPs

In §5 we have established that the parametric functions η(µ, ·) is a strongly self-concordant family. Therefore, it becomes straightforward to develop primal path following interior point algorithms for solving SSP (4.1.3, 4.1.4). In this section we introduce a class of volumetric barrier algorithms for solving this problem. This class is stated formally in Algorithm 2.

0 0 Our algorithm is initialized with a starting point x ∈ F0 and a starting value µ > 0 for the barrier parameter µ, and is indexed by a parameter γ ∈ (0, 1). We use δ as a measure of the proximity of the current point x to the central path, and β as a threshold for that measure. If the current x is too far away from the central path in the sense that δ > β, Newton’s method is applied to find a point close to the central path. Then the value of µ is reduced by a factor γ and the whole precess is repeated until the value of µ is within the tolerance . By tracing the central path as µ approaches zero, a strictly

97 Algorithm 2 Volumetric Barrier Algorithm for Solving SSP (4.1.3,4.1.4) 0 0 Require:  > 0, γ ∈ (0, 1), θ > 0, β > 0, x ∈ F0 and µ > 0. x := x0, µ := µ0 while µ ≥  do for k = 1, 2,...,K do solve (4.1.6) to obtain y¯(k) end for 2 −1 compute ∆x := −{∇xxη(µ, x)} ∇xη(µ, x) using (4.1.15) r 1 compute δ(µ, x) := − ∆xT∇2 η(µ, x)∆x using (4.1.15) µ xx while δ > β do x := x + θ∆x for k = 1, 2,...,K do solve (4.1.6) to obtain y¯(k) end for 2 −1 compute ∆x := −{∇xxη(µ, x)} ∇xη(µ, x) using (4.1.15) r 1 compute δ(µ, x) := − ∆xT∇2 η(µ, x)∆x using (4.1.15) µ xx end while µ := γµ end while feasible -solution to (4.1.6) will be generated.

4.4 Complexity analysis

Theorems 4.4.1 and 4.4.2 present the complexity analysis for two variants of algorithms: short-step algorithms and long-step algorithms, which are controlled by the manner of selection γ that we have made in Algorithm 2. In the short-step version of the algorithm, we decrease the barrier parameter in each

p th iteration by a factor γ := 1−σ/ (1 + K)(m1c1 + m2c2), σ < 0.1. The k iteration of the short-step algorithms is performed as follows: at the beginning of the iteration, we have µ(k−1) and x(k−1) on hand and x(k−1) is close to the center path, i.e., δ(µ(k−1), x(k−1)) ≤ β. After the barrier parameter µ is reduced from µ(k−1) to µk := γµ(k−1), we have that δ(µk, x(k−1)) ≤ 2β. Then a full Newton step with size θ = 1 is taken to produce a new

98 point xk with δ(µk, xk) ≤ β. We now show that, in this class of algorithms, only one Newton step is sufficient for recentering after updating the parameter µ. We have the following theorem.

Theorem 4.4.1. Consider Algorithm 2 and let µ0 be the initial barrier parameter,  > 0 √ the stopping criterion, and β = (2 − 3)/2. If the starting point x0 is sufficiently close to the central path, i.e., δ(µ0, x0) ≤ β, then the short-step algorithm reduces the barrier p parameter µ at a linear rate and terminates with at most O( (1 + K)(m1c1 + m2c2) ln(µ0/)) iterations.

In the long-step version of the algorithm, the barrier parameter is decreased by an arbitrary constant factor γ ∈ (0, 1). It has a potential for larger decrease on the objective function value, however, several damped Newton steps might be needed for restoring the proximity to the central path. The kth iteration of the long-step algorithms is performed as follows: at the beginning of the iteration we have a point x(k−1), which is sufficiently close to x(µ(k−1)), where x(µ(k−1)) is the solution to (4.1.5) for µ := µ(k−1). The barrier parameter is reduced from µ(k−1) to µk := γµ(k−1), where γ ∈ (0, 1), and then searching is started to find a point xk that is sufficiently close to x(µk). The long-step algorithm

k generates a finite sequence consisting of N points in F0, and we finally take x to be equal to the last point of this sequence. We want to determine an upper bound on N, the number of Newton iterations that are needed to find the point xk. We have the following theorem.

Theorem 4.4.2. Consider Algorithm 2 and let µ0 be the initial barrier parameter,  > 0 the stopping criterion, and β = 1/6. If the starting point x0 is sufficiently close to the central path, i.e., δ(µ0, x0) ≤ β, then the long-step algorithm reduces the barrier parameter

0 µ at a linear rate and terminates with at most O((1+K)(m1c1+m2c2) ln(µ /)) iterations.

The proofs of the Theorems 4.4.1 and 4.4.1 are similar to the proofs of Theorems 3.4.1

99 and 3.4.2 and are given in the subsections below. For convenience, we restate Proposition 3.4.1(i) and Lemma 3.4.1.

2 −1 Proposition 4.4.1. For any µ > 0 and x ∈ F0, we denote ∆x := −{∇xxη(µ, x)} q 1 2 m2 ∇xη(µ, x) and δ := µ ∇xxη(µ, x)[∆x, ∆x]. Then for δ < 1, τ ∈ [0, 1] and any h ∈ R we have

2 −2 2 ∇xxη(µ, x + τ∆x)[h, h] ≤ (1 − τδ) ∇xxη(µ, x)[h, h].

Lemma 4.4.1. For any µ > 0 and x ∈ F0, let ∆x be the Newton direction defined

2 −1 q 1 2 + by ∆x := −{∇xxη(µ, x)} ∇xη(µ, x); δ := δ(µ, x) = µ ∇xxη(µ, x)[∆x, ∆x], x = x + ∆x, ∆x+ be the Newton direction calculated at x+, and δ(µ, x+) :=

q 1 2 + + + µ ∇xxη(µ, x )[∆x , ∆x ]. Then the following relations hold:

√  δ 2 δ (i) If δ < 2 − 3, then δ(µ, x+) ≤ ≤ . 1 − δ 2 √ (ii) If δ ≥ 2 − 3, then η(µ, x) − η(µ, x + θ¯∆x) ≥ µ(δ − ln(1 + δ)), where θ¯ = (1 + δ)−1.

4.4.1 Complexity for short-step algorithm

For the purpose of proving Theorems 4.4.1, we present the following proposition which is a restatement of [30, Theorem 3.1.1].  √  + (1+K)(m1c1+m2c2) −1 Proposition 4.4.2. Let χκ(η; µ, µ ) := 1 + κ lnγ . Assume that δ(µ, x) < κ and µ+ := γµ satisfies

δ(µ, x) χ (η; µ, µ+) ≤ 1 − . κ κ

Then δ(µ+, x) < κ.

+ p Lemma 4.4.2. Let µ = γµ, where γ = 1 − σ/ (1 + K)(m1c1 + m2c2) and σ ≤ 0.1, √ and let β = (2 − 3)/2. If δ(µ, x) ≤ β, then δ(µ+, x) ≤ 2β.

100 √ Proof. Let κ := 2β = 2 − 3. Since δ(µ, x) ≤ κ/2, one can verify that for σ ≤ 0.1, µ+ satisfies 1 δ(µ, x) χ (η; µ, µ+) ≤ ≤ 1 − . κ 2 κ

By Proposition 4.4.2, we have δ(µ+, x) ≤ κ. 2 By Lemmas 4.4.1(i) and 4.4.2, we conclude that we can reduce the parameter µ by p the factor γ := 1 − σ/ (1 + K)(m1c1 + m2c2), σ < 0.1, at each iteration, and that only one Newton step is sufficient to restore proximity to the central path. Hence, Theorem 4.4.1 follows.

4.4.2 Complexity for long-step algorithm

For x ∈ F0 and µ > 0, we define the function φ(µ, x) := η(µ, x(µ)) − η(µ, x), which represents the difference between the objective value η(µk, x(k)) at the end of kth iteration and the minimum objective value η(µk, x(µk−1)) at the beginning of kth iteration. Then the task is to find an upper bound on φ(µ, x). To do so, we first give upper bounds on φ(µ, x) and φ0(µ, x) respectively. We have the following lemma.

˜ Lemma 4.4.3. Let µ > 0 and x ∈ F0, we denote ∆x := x − x(µ) and define

r 1 δ˜ := δ˜(µ, x) = ∇2 η(µ, x)[∆˜ x, ∆˜ x]. µ xx

˜ For any µ > 0 and x ∈ F0, if δ < 1, then the following inequalities hold:

! δ˜ φ(µ, x) ≤ µ + ln(1 − δ˜) , (4.4.1) 1 − δ˜

0 p ˜ |φ (µ, x)| ≤ − (1 + K)(m1c1 + m2c2) ln(1 − δ). (4.4.2)

101 Proof.

Z 1 ˜ ˜ φ(µ, x) := η(µ, x) − η(µ, x(µ)) := ∇xη(µ, x + (1 − τ)∆x)[∆x]dτ. 0

Since x(µ) is the optimal solution, we have

∇xη(µ, x(µ)) = 0. (4.4.3)

Hence, with the aid of Proposition 4.4.1, we get

Z 1 Z 1 2 ˜ ˜ ˜ φ(µ, x) = ∇xxη(µ, x(µ) + (1 − α)∆x)[∆x, ∆x] dαdτ 0 0 Z 1 Z 1 ∇2 η(µ, x)[∆˜ x, ∆˜ x] ≤ xx dαdτ 2 0 0 (1 − δˆ + αδˆ) Z 1 Z 0 µδ˜2 = dαdτ 2 0 1 (1 − δˆ + αδˆ) ! δ˜ = µ + ln(1 − δ˜) , 1 − δ˜ which establishes (4.4.1). Now, for any µ > 0, by applying the chain rule, using (4.4.3), and applying the Mean-Value Theorem, we get

0 0 0 T 0 φ (µ, x) = η (µ, x) − η (µ, x(µ)) − ∇xη(µ, x(µ)) x (µ) = η0(µ, x) − η0(µ, x(µ)) (4.4.4)

˜ T ˜ = ∇xη(µ, x(µ) + $∆x) ∆x,

for some $ ∈ (0, 1). Hence

Z 1 0 ˜ 0 ˜ |φ (µ, x)| = {∇xη(µ, x(µ) + τ∆x)} [∆x] dτ 0

102 Z 1 q 2 ˜ ˜ ˜ ≤ ∇xxη(µ, x(µ) + τ∆x)[∆x, ∆x] q0 ˜ 0 2 ˜ −1 ˜ 0 {∇xη(µ, x(µ) + τ∆x)} • ({∇xxη(µ, x(µ) + τ∆x)} {∇xη(µ, x(µ) + τ∆x)} ) dτ.

In view of Lemma 4.2.6 we have the following estimation

(1 + K)(m c + m c ) −{∇ η(µ, x)}0 • ({∇2 η(µ, x)}−1{∇ η(µ, x)}0) ≤ 1 1 2 2 . (4.4.5) x xx x µ

Then, by using (4.4.5), Proposition 4.4.1, and the observation x(µ)+τ∆˜ x = x−(1−τ)∆˜ x, we obtain

q 1 2 ˜ ˜ s Z ∇xxη(µ, x)[∆x, ∆x] (1 + K)(m c + m c ) |φ0(µ, x)| ≤ 1 1 2 2 dτ 0 1 − (1 − τ)δ˜ µ √ s Z 1 δ˜ µ (1 + K)(m c + m c ) = 1 1 2 2 dτ 0 1 − δ˜ + τδ˜ µ Z 1 p δ˜ = (1 + K)(m1c1 + m2c2) dτ 0 1 − δ˜ + τδ˜ p ˜ = − (1 + K)(m1c1 + m2c2) ln(1 − δ), which establishes (4.4.2). 2

˜ ˜ Lemma 4.4.4. Let µ > 0 and x ∈ F0 be such that δ < 1, where δ is as defined in Lemma 4.4.3. Let µ+ := γµ with γ ∈ (0, 1). Then

+ + + + η(µ , x) − η(µ , x(µ )) ≤ O(1)[(1 + K)(m1c1 + m2c2)]µ .

Proof. Differentiating (4.4.4) with respect to µ, we get

00 00 00 0T 0 φ (µ, x) = η (µ, x) − η (µ, x(µ)) − {∇xη(µ, x(µ))} x (µ). (4.4.6)

We will work on the right-hand side of (4.4.6) by bounding the second and the last

00 PK (k)00 (k) terms. Observe that η (µ, x(µ)) = k=1 ρ (µ, x). Differentiating ρ (µ, x) with re-

103 (k)0 (k)0 spect to µ we obtain ρ (µ, x) = c2v2(x, y˜). Now, differentiating ρ (µ, x) with respect

(k)00 T 0 to µ we obtain ρ (µ, x) = c2∇y˜v2(x, y˜) y˜ . Then, differentiating (4.1.13) yields

1 y˜0 = − {∇2 v (x, y˜)}−1∇ v (x, y˜). µ y˜y˜ 2 y˜ 2

Therefore we have

1 ρ00(µ, x) = − c ∇ v (x, y¯)T{∇2 v (x, y¯)}−1∇ v (x, y¯). k µ 2 y¯ 2 y¯y¯ 2 y¯ 2

1 It can be shown (see also [7, Theorem 4.4]) that −ρ00(µ, y) ≤ c m . Thus k µ 2 2

K X 00 m2c2K −η00(µ, x(µ)) = − ρ(k) (µ, x) ≤ . (4.4.7) µr2 k=1

By differentiating (4.4.3) with respect to µ, we get

0 2 0 {∇xη(µ, x(µ))} + ∇xxη(µ, x(µ))x (µ) = 0, or equivalently,

0 2 −1 0 x (µ) = −{∇xxη(µ, x(µ))} ∇xη (µ, x(µ)).

Hence, by using (4.4.5), we have

0T 0 0T 2 −1 0 −{∇xη(µ, x(µ))} x (µ) = {∇xη(µ, x(µ))} {∇xxη(µ, x(µ))} ∇x{η(µ, x(µ))} (1 + K)(m c + m c ) ≤ 1 1 2 2 . µ (4.4.8) Observe that η(µ, x) is concave in µ, it follows that η00(µ, x) ≤ 0. Combining this with (4.4.7) and (4.4.8), we obtain

104 m(1 + K)(m c + 2m c ) φ00(µ, x(µ)) ≤ 1 1 2 2 . (4.4.9) µ

Applying the Mean-Value Theorem and using Lemma 4.4.3 and (4.4.9) to get

Z µ Z µ φ(µ+, x) = φ(µ, x) + φ0(µ, x)(µ+ − µ) + φ00(υ, x) dυdτ µ+ τ ˜ ! δ ˜ p ˜ + ≤ µ + ln(1 − δ) − (1 + K)(m1c1 + m2c2) ln(1 − δ)(µ − µ) 1 − δ˜ Z µ Z τ −1 +m(1 + K)(m1c1 + 2m2c2) υ dυdτ µ+ µ ˜ ! δ ˜ p ˜ + = µ + ln(1 − δ) − (1 + K)(m1c1 + m2c2) ln(1 − δ)(µ − µ) 1 − δ˜

+ τ +m(1 + K)(m1c1 + 2m2c2)(µ − µ ) ln µ ˜ ! δ ˜ p ˜ + ≤ µ + ln(1 − δ) − (1 + K)(m1c1 + m2c2) ln(1 − δ)(µ − µ ) 1 − δ˜

+ −1 +m(1 + K)(m1c1 + 2m2c2)(µ − µ ) lnγ .

(Recall that γ−1 = µ+/µ ≥ τ/µ.) Since δ˜ and γ are constants, the lemma is established. 2 Note that the previous lemma requires δ˜ < 1. However, evaluating δ˜ explicitly may not be possible. In the next lemma we will see that δ˜ is actually proportional to δ, which can be evaluated.

2 −1 Lemma 4.4.5. For any µ > 0 and x ∈ F0, let ∆x := −{∇xxη(µ, x)} ∇xη(µ, x) and ∆˜ x := x − x(µ). We denote

r 1 r 1 δ := δ(µ, x) = − ∇2 η(µ, x)[∆x, ∆x] and δ˜ := δ˜(µ, x) = − ∇2 η(µ, x)[∆˜ x, ∆˜ x]. µ xx µ xx

2 ˜ If δ < 1/6, then 3 δ ≤ δ ≤ 2δ.

Proof. See the proof of Lemma 3.4.5. 2 Theorem 4.4.2 follows by combining Lemmas 4.4.1(ii), 4.4.4, and 4.4.5.

105 Chapter 5

Some Applications

In this chapter of this dissertation, we will turn our attention to present four applications of SSOCPs and SRQCPs. Namely, we describe stochastic Euclidean facility location problem and the portfolio optimization problem with loss risk constraints as two applications of SSOCPs, then we describe the optimal covering random ellipsoid problem and an application in structural optimization as two applications of SRQCPs. We also refer the reader to a paper by Maggioni et al. [24] which describes another important application of SRQCPs in mobile ad hoc networks. The results of this chapter have been submitted for publication [2].

5.1 Two applications of SSOCPs

Each one of the following two subsections is devoted to describe an application of SSOCPs.

5.1.1 Stochastic Euclidean facility location problem

In facility location problems (FLPs) we are interested in choosing a location to build a new facility or locations to build multiple new facilities so that an appropriate measure of distance from the new facilities to existing facilities is minimized. FLPs arise locating

106 airports, regional campuses, wireless communication towers, etc. The following are two ways of classifying FLPs (see also [39]): We can classify FLPs based on the number of new facilities in the following sense: if we add only one new facility then we get a problem known as a single facility location problem (SFLP), while if we add multiple new facilities instead of adding only one, then we get more a general problem known as a multiple facility location problem (MFLP). Another way of classification is based on the distance measure used in the model between the facilities. If we use the Euclidean distance then these problems are called Euclidean facility location problems (EFLPs), if we use the rectilinear distance then these problems are called rectilinear facility location problems (RFLPs). In (deterministic) Euclidean single facility location problem (ESFLP), we are given r

n existing facilities represented by the fixed points a1, a2, ··· , ar in R , and we plan to place a new facility represented by x so that we minimize the weighted sum of the distances

between x and each of the points a1, a2, ··· , ar. This leads us to the problem

Pr min i=1 wi kx − aik2

or, alternatively, to the problem (see for example [42])

Pr min i=1 wi ti

r s.t. (t1; x − a1; ··· ; tr; x − ar) (n+1)r 0, where wi is the weight associated with the ith existing facility and the new facility for i = 1, 2, . . . , r. However, the resulting model is a DSOCP model. So, there is no stochasticity in this model. But what if we assume that some of the fixed existing facilities are random? Where could such a randomness be found in the real world? Picture (A) in Figure 5.1 shows the locations of multi-national military facilities led by troops from the United States and the United Kingdom in the Middle East region as

107 Figure 5.1: The multinational military facilities locations before and after starting the Iraq war. Pictures taken from www.globalsecurity.org/military/facility/centcom.htm. of December, 31, 2002, i.e., before the beginning of the Iraq war (it is known that the war began on March 20, 2003). This picture shows the multi-national military facilities including the navy facility locations and the air force facility locations but not the army facility locations. Assuming that the random existing facilities are the Iraqi military facilities whose locations are unknown at the time of taking this picture which is before the beginning of the war. Assuming also that the new facilities are the army multi-national facilities whose locations have to be determined at the time of taking this picture which is before the beginning of the war. Then it is reasonable to look at a problem of this kind as a stochastic ESFLP. Picture (B) in Figure 5.1 shows the locations of all multi-national military facility locations including the army facility locations as of December, 31, 2002, i.e., after starting the Iraq war. So, in some applications, the locations of existing facilities cannot be fully specified because the locations of some of them depend on information not available at the time when decision needs to be made but will only be available at a later point in time. In general, in order to be precise only the latest information of the random facilities is used. This may require an increasing or decreasing of the number of the new facilities after

108 the latest information about the random facilities become available. In this case, we are interested in stochastic facility location problems (or abbreviated as stochastic FLPs). When the locations of all old facilities are fully specified, FLPs are called deterministic facility location problems (or abbreviated as deterministic FLPs). In this section we consider (both single and multiple) stochastic Euclidean facility location problems, and in the next chapter we describe four different models of FLP, two of them can be viewed as generalizations of the models presented in this section.

Stochastic ESFLPs

n Let a1, a2,..., ar1 be fixed points in R representing the coordinates of r1 existing fixed

n facilities and a˜1(ω), a˜2(ω),..., a˜r2 (ω) be random points in R representing the coordinates of r2 random facilities who realizations depends on an underlying outcome ω in an event space Ω with a known probability function P .

Suppose that at present we do not know the realizations of r2 random facilities, and that at some point in time in future the realizations of these r2 random facilities become known. Our goal is to locate a new facility x that minimizes the weighted sums of the Euclidean distance between the new facility and each one of the existing fixed facilities and also minimizes the expected weighted sums of the distance between the new facility and the realization of each one of the random facilities. Note that this decision needs to be made before the realizations of the r2 random facilities become available. This leads us to the following SSOCP model:

Pr1 min i=1 wi ti + E [Q(x, ω)]

s.t. (t1; x − a1; ... ; tr1 ; x − ar1 ) r1 0, where Q(x, ω) is the minimum value of the problem

109 Pr2 ˜ min j=1 w˜j tj ˜ ˜ s.t. (t1; x − a˜1(ω); ... ; tr2 ; x − a˜r2 (ω)) r2 0, and Z E[Q(x, ω)] := Q(x, ω)P (dω), Ω where wi is the weight associated with the ith existing facility and the new facility for i = 1, 2, . . . , r1 andw ˜j(ω) is the weight associated with the jth random existing facility and the new facility for j = 1, 2, . . . , r2.

Stochastic EMFLPs

n Assume that we need to add m new facilities, namely x1, x2,..., xm ∈ R , instead of adding only one. This may require an increasing or decreasing of the number of the new facilities after the latest information about the random facilities become available. For simplicity, let us assume that the number of new facilities was previously known and fixed. Then we have two cases depending on whether or not there is an interaction among the new facilities in the underlying model. If there is no interaction between the new facilities, we are just concerned in minimizing the weighted sums of the distance between each one of the new facilities on one hand and each one of the fixed facilities and the realization of each one of the random facilities on the other hand. In other words, we solve the following SSOCP model:

Pm Pr1 min j=1 i=1 wij tij + E [Q(x1; ... ; xm, ω)]

s.t. (t1j; xj − a1; ... ; tr1j; xj − ar1 ) r1 0, j = 1, 2, . . . , m, where Q(x1; ··· ; xm, ω) is the minimum value of the problem

110 Pm Pr2 ˜ min j=1 i=1 w˜ij(ω) tij ˜ ˜ s.t. (t1j; xj − a˜1(ω); ... ; tr2j; xj − a˜r2 (ω)) r2 0, j = 1, 2, . . . , m, and Z E[Q(x1; ... ; xm, ω)] := Q(x1; ... ; xm, ω)P (dω). Ω where wij is the weight associated with the ith existing facility and the jth new facility for j = 1, 2, . . . , m and i = 1, 2, . . . , r1, andw ˜ij(ω) is the weight associated with the ith random existing facility and the jth new facility for j = 1, 2, . . . , m and i = 1, 2, . . . , r2. If interaction exists among the new facilities, then, in addition to the above require- ments, we need to minimize the sum of the Euclidean distances between each pair of the new facilities. In this case, we are interested in a model of the form:

m r1 m j−1 P P P P 0 ˆ 0 min j=1 i=1 wijtij + j=1 j0=1 wˆjj tjj + E [Q(x1; ... ; xr1 , ω)]

s.t. (t1j; xj − a1; ... ; tr1j; xj − ar1 ) r1 0, j = 1, 2, . . . , m

(tˆj(j+1); xj − xj+1; ... ; tˆjm; xj − xm) (m−j) 0, j = 1, 2, . . . , m − 1,

where Q(x1; ... ; xm, ω) is the minimum value of the problem

Pm Pr2 ˜ Pm Pj−1 ˆ min j=1 i=1 w˜ij(ω) tij + j=1 j0=1 wˆjj0 tjj0 ˜ ˜ s.t. (t1j; xj − a˜1(ω); ... ; tr2j; xj − a˜r2 (ω)) r2 0, j = 1, 2, . . . , m

(tˆj(j+1); xj − xj+1; ... ; tˆjm; xj − xm) (m−j) 0, j = 1, 2, . . . , m − 1, and Z E[Q(x1; ... ; xm, ω)] := Q(x1; ... ; xm, ω)P (dω), Ω

111 0 0 wherew ˆjj0 is the weight associated with the new facilities j and j for j = 1, 2, . . . , j − 1 and j = 1, 2, . . . , m.

5.1.2 Portfolio optimization with loss risk constraints

The application in this subsection is a well-known problem from portfolio optimization. We consider the problem of maximizing the expected return subject to loss risk con- straints. The same problem over one period was cited as an application of DSOCP (see Lobo, et al. [23]). Some extensions of this problem will be described.

The problem

Consider a portfolio problem with n assets or stocks over two periods. We start by letting xi denote the amount of asset i held at the beginning of (and throughout) the first period,

n and pi will denote the price change of asset i over this period. So, the vector p ∈ R is the price vector over the first period. For simplicity, we let p be Gaussian with known mean p¯ and covariance Σ, so the return over this period is the (scalar) Gaussian random

T T T variable r = p x with meanr ¯ = p¯ x and variance σ = x Σx where x = (x1; x2; ... ; xn).

Let yi denote the amount of asset i held at the beginning of (and throughout) the

second period, and qi(ω) will denote the random price change of asset i over this period whose realization depends on an underlying outcome ω in an event space Ω with known

probability function P . Similarly, we take the price vector q(ω) ∈ Rn over the second period to be Gaussian with mean q¯(ω) and covariance Σ(˜ ω), so the return over this period is the (scalar) Gaussian random variabler ˜(ω) = q(ω)Ty with mean r˜¯(ω) = q¯(ω)Ty and

T ˜ varianceσ ˜(ω) = y Σ(ω)y where y = (y1; y2; ... ; yn). Suppose that in the first period we

do not know the realization of the Gaussian vector q(ω) ∈ Rn, and that at some point in time in the future (after the first period) the realization of this Gaussian vector becomes known. As pointed out in [23], the choices of portfolio variables x and y involve the

112 (classical, Markowitz) tradeoff between random return mean and random variance.

The optimization variables are the portfolio vectors x ∈ Rn (of the first stage) and y ∈ Rn (of the second stage). For these portfolio vectors, we take the simplest assumptions Pn Pn x ≥ 0, y ≥ 0 (i.e., no short positions [23]) and i=1 xi = i=1 yi = 1 (i.e. unit total budget [23]). Let α andα ˜ be given unwanted return levels over the first and second periods, respec- tively, and let β and β˜ be given maximum probabilities over the first and second periods, respectively. Assuming the above data is given, our goal is to determine the amount of

the asset i (which is xi over the first period and yi over the second period), i.e. determine x and y, such that the expected returns over these two periods are maximized subject to the following two types of loss risk constraints: the constraint P (r ≤ α) ≤ β must be satisfied over the first period, and the constraint P (˜r(ω) ≤ α˜) ≤ β˜ must be satisfied over the second period. This determination needs to be made before the realizations become available.

Formulation of the model

As noted in [23], the constraint P (r ≤ α) ≤ β is equivalent to the second-order cone constraint

 −1 1  α − r¯;Φ (β)(Σ 2 x)  0,

provided β ≤ 1/2 (i.e.,Φ−1(β) ≤ 0), where

z 1 Z 2 Φ(z) = √ e−t /2dt 2π −∞

is the cumulative normal distribution function of a zero mean unit variance Gaussian random variable. To prove this (see also [23]), notice that the constraint P (r ≤ α) ≤ β

113 can be written as r − r¯ α − r¯ P √ ≤ √ ≤ β. σ σ √ Since (r − r¯)/ σ is a zero mean unit variance Gaussian random variable, the probability √ above is simply Φ((α − r¯)/ σ), thus the constraint P (r ≤ α) ≤ β can be expressed as √ √ √ Φ((α − r¯)/ σ) ≤ β or ((α − r¯)/ σ) ≤ Φ−1(β), or equivalentlyr ¯+ Φ−1(β) σ ≥ α. Since

√ √ q T 1/2 T 1/2 1/2 σ = x Σ x = (Σ x) (Σ x) = kΣ xk2,

the constraint P (r ≤ α) ≤ β is equivalent to the the second-order cone inequalityr ¯ +

−1 1/2 Φ (β)kΣ xk2 ≥ α or equivalently to the second-order cone constraint

 −1 1  α − r¯;Φ (β)(Σ 2 x)  0.

Similarly, provided β˜ ≤ 1/2, the constraint P (˜r(ω) ≤ α˜) ≤ β˜ is equivalent to the second-order cone constraint

  α˜ − r˜¯(ω); Φ−1(β˜)(Σ(˜ ω)1/2y)  0.

Our goal is to determine the amount of the asset i (which is xi over the first period

and yi over the second period), i.e. determine x and y, such that the expected returns over these two periods are maximized. This problem can be cast as a two-stage SSOCP as follows:

max p¯T x + E[Q(x, ω)] s.t. α − p¯Tx;Φ−1(β)(Σ1/2x)  0 (5.1.1) 1Tx = 1, x ≥ 0,

where Q(x, ω) is the maximum value of the problem

114 max q¯(ω)T y   s.t. α˜ − q¯(ω)Ty;Φ−1(β˜)(Σ(˜ ω)1/2y)  0 (5.1.2)

1Ty = 1, y ≥ 0, and Z E[Q(x, ω)] := Q(x, ω)P (dω). Ω

Note that this formulation is different from one suggested by Mehrotra and Ozevin¨ [27] which minimizes the variance period (two-stage extension of Markowitz’s mean-variance model). Here we formulate a model with a linear objective function leading to another approach for solving this problem.

Extensions

The simple problem described above has many extensions [23]. One of these extensions is imposing several loss risk constraints, i.e., the constraints P (r ≤ αi) ≤ βi, i = 1, 2, . . . , k1

(where βi ≤ 1/2, for i = 1, 2, . . . , k1), or equivalently

−1 1/2  αi − r¯;Φ (βi)(Σ x)  0, for i = 1, 2, . . . , k1

˜ to be satisfied over the first period, and the constraints P (˜r(ω) ≤ α˜j) ≤ βj, j = 1, 2, . . . , k2 ˜ (where βj ≤ 1/2, for j = 1, 2, . . . , k2), or equivalently

  −1 ˜ ˜ 1/2 α˜j − r˜¯(ω); Φ (βj)(Σ(ω) y)  0, for j = 1, 2, . . . , k2 to be satisfied over the second period. So our problem becomes

115 max p¯T x + E[Q(x, ω)] −1 1/2  s.t. αi − r¯;Φ (βi)(Σ x)  0, i = 1, 2, . . . , k1 1Tx = 1, x ≥ 0, where Q(x, ω) is the maximum value of the problem

max q¯(ω)T y   −1 ˜ ˜ 1/2 s.t. α˜j − r˜¯(ω); Φ (βj)(Σ(ω) y)  0, j = 1, 2, . . . , k2 1Ty = 1, y ≥ 0, and Z E[Q(x, ω)] := Q(x, ω)P (dω). Ω

As another extension, observe that the statistical models (p¯; Σ) for the price changes   during the first period and q¯(ω), Σ(˜ ω) for the price changes during the second period are both uncertain (regardless of the fact that the later depends on ω while the first does not) and one of the limitations of model in (5.1.1, 5.1.2) is its need to estimate these statistical models. In [11], Bawa et al. indicated that using estimates of unknown expected returns and unknown covariance matrices leads to an estimation risk in portfolio choice. To handle this uncertainty, as in [11], we take those expected returns and covariance matrices that belong to bounded intervals. i.e., we consider the statistical models (p¯; Σ) and (q¯(ω), Σ(˜ ω)) that belong to the bounded sets:

ℵ := {(p¯; Σ) : p¯l ≤ p¯ ≤ p¯u, Σl ≤ Σ ≤ Σu}

116 and ˜ ˜ ˜ ˜ < := {(q¯(ω); Σ(ω)) : q¯l(ω) ≤ q¯(ω) ≤ q¯u(ω), Σl(ω) ≤ Σ(ω) ≤ Σu(ω)},

˜ ˜ respectively, where p¯l, p¯u, q¯l(ω), q¯u(ω), Σl, Σu, Σl(ω) and Σu(ω) are the extreme values of the intervals mentioned above, then we take a worst case realization of the statistical   models (p¯; Σ) and q¯(ω), Σ(˜ ω) by maximizing the minimum of the expected returns over all (p¯; Σ) ∈ ℵ and (q¯(ω), Σ(˜ ω)) ∈ <. Let us consider here only a discrete version of this development. Suppose we have

N1 different possible scenarios over the first period, each of which is modeled by a simple

Gaussian model for the price change vector pk with mean p¯k and covariance Σk. Similarly, we also have N2 different possible scenarios over the second period, each of which is modeled by a simple Gaussian model for the price change ql(ω), with mean q¯l(ω) and covariance Σl(ω). We can then take a worst case realization of these information by maximizing the minimum of the expected returns for these different scenarios, subject to a constraint on the loss risk for each scenario to get the following SSOCP model:

max p¯T x + E[Q(x, ω)]

 T −1 1/2  s.t. α − p¯i x;Φ (β)(Σi x)  0, i = 1, 2,...,N1 1Tx = 1, x ≥ 0, where E[Q(x, ω)] is the maximum value of the problem

max q¯(ω)T y   T −1 ˜ ˜ 1/2 s.t. α˜ − q¯j(ω) y;Φ (β)(Σj(ω) y)  0, j = 1, 2,...,N2 1Ty = 1, y ≥ 0, and Z E[Q(x, ω)] := Q(x, ω)P (dω). Ω

117 5.2 Two applications of SRQCPs

The stochastic versions of the two problems in this section have been described and formulated by Ariyawansa and Zhu [49] as SSDP models, but it has been found recently [1, 23, 24] that on numerical grounds solving DSOCPs (SSOCPs) or DRQCPs1 (SRQCPs) by treating them as DSDPs (SSDPs) is inefficient, and that DSOCPs (SSOCPs) or DRQCPs (SRQCPs) can be solved more efficiently by exploiting their structure. In fact, in [1, 23] we see many problems first formulated as DSDPs in [38, 41, 30] have been reformulated as DSOCPs or DRQCP mainly for this reason. In view of this fact, in this section we reformulate SRQCP models (as these problems should be solved as such) for the minimum- volume covering random ellipsoid problem and the structural optimization problem.

5.2.1 Optimal covering random ellipsoid problem

The problem

The problem description in this part is taken from Ariyawansa and Zhu [9, Subsection

3.2]. Suppose we have n1 fixed ellipsoids:

n T T n Ei := {x ∈ R : x Hix + 2gi x + vi ≤ 0} ⊂ R , i = 1, 2, . . . , n1

n n where Hi ∈ S+, gi ∈ R and vi ∈ R are deterministic data for i = 1, 2, . . . , n1.

Suppose we also have n2 random ellipsoids:

˜ n T ˜ T n Ei(ω) := {x ∈ R : x Hi(ω)x + 2g˜i(ω) x +v ˜i(ω) ≤ 0} ⊂ R , i = 1, 2, . . . , n2,

˜ n n where, for i = 1, 2, . . . , n2, Hi(ω) ∈ S+, g˜i(ω) ∈ R andv ˜i(ω) ∈ R are random data whose realizations depend on an underlying outcome ω in an event space Ω with a known

1The acronym DRQCPs stands for deterministic rotated quadratic cone programs.

118 probability function P . We assume that, at present time, we do not know the realizations of n2 random ellipsoids, and that at some point in the future the realizations of these n2 ellipsoids become known.

Given this, we need to determine a ball that contains all n2 fixed ellipsoids and the realizations of the n2 random ellipsoids. This decision needs to be made before the real- izations of the random ellipsoids become available. Consequently, when the realizations of the random ellipsoids do become available, the ball that has already been determined may or may not contain all the realized random ellipsoids. In order to guarantee that the modified ball contains all (fixed and realizations of random) ellipsoids, we assume that at that stage we are allowed to change the radius of the ball (but not its center), if necessary. We consider the same assumptions as in [49]. We assume that the cost of choosing the ball has three components:

• the cost of the center, which is proportional to the Euclidean distance to the center from the origin;

• the cost of the initial radius, which is proportional to the square of the radius;

• the cost of changing the radius. The center and the radius of the initial ball are determined so that the expected total cost is minimized.

In [49], Ariyawansa and Zhu describe the following concrete version of this generic application: Let n := 2. The fixed ellipsoids contain targets that need to be destroyed, and the random ellipsoids contain targets that also need to be destroyed but are moving. Fighter aircrafts take off from the origin with a planned disk of coverage that contains the fixed ellipsoids. In order to be accurate only the latest information about the moving targets is used. This may require increasing the radius of the planned disk of coverage after latest information about the random ellipsoids become available, which may occur after the initially planned fighter aircrafts have taken off. This increase, dependent on

119 the initial disk of coverage and the specific information about the moving targets, may result in an additional cost. The initial disk of coverage need to be determined so that the expected total cost is minimized.

Our first goal is to determine x¯ ∈ Rn and γ ∈ R such that the ball B defined by

n T T B := {x ∈ R : x x − 2x¯ x + γ ≤ 0}

contains the fixed ellipsoids Ei for i = 1, 2, . . . , n1. As we mentioned, this determination need to be determined before the realization of the random ellipsoids become available. When the realizations of the random ellipsoids become available, if necessary, we need to determineγ ˜ so that the new ball

n T T B˜ := {x ∈ R : x x − 2x¯ x +γ ˜ ≤ 0}

˜ contains all the realizations of the random ellipsoids Ei(ω) for i = 1, 2, . . . , n2. Notice that the center of the ball B is x¯, its radius is r := px¯Tx¯ − γ, and the distance √ from the origin to its center is x¯Tx¯. Notice also that the new ball B˜ has the same center x¯ as B but a larger radiusr ˜ := px¯Tx¯ − γ˜.

Formulation of the model

2 T 2 T We introduce the constraints d1 ≥ x¯ x¯ and d2 ≥ r = x¯ x¯ − γ. That is, d1 is an upper √ T bound on the distance between the center of the ball B and the origin, x¯ x¯, and d2 is an upper bound on square of the radius of the ball B. In order to proceed, we feel that it is necessary for the reader to recall the following fact:

n Fact 1 (Sun and Freund [37]). Suppose that we are given two ellipsoids Ei ⊂ R , i = 1, 2

n T T n n defined by Ei := {x ∈ R : x Hix + 2gi x + vi ≤ 0}, where Hi ∈ S+, gi ∈ R and vi ∈ R

120 for i = 1, 2, then E1 contains E2 if and only if there exists τ ≥ 0 such that the linear matrix inequality     H1 g1 H2 g2    τ    T   T  g1 v1 g2 v2

holds.

In view of Fact 1 and the requirement that the ball B contains the fixed ellipsoids Ei ˜ for i = 1, 2, . . . , n1, and the realizations of the random ellipsoids Ei(ω) for i = 1, 2, . . . , n2, we accordingly add the following constraints:

    I −x¯ Hi gi    τi   , i = 1, 2, . . . , n1,  T   T  −x¯ γ gi vi

and     ˜ I −x¯ Hi(ω) g˜i(ω)    δi   , i = 1, 2, . . . , n2,  T   T  −x¯ γ˜ gi (ω) vi(ω)

or equivalently

˜ Mi  0, ∀i = 1, . . . , n1 and Mi(ω)  0, i = 1, 2, . . . , n2

where for each i = 1, . . . , n1,

  τiHi − I τigi + x¯ Mi :=   ,  T T  τigi + x¯ τivi − γ

and for each i = 1, . . . , n2,

  δ H˜ (ω) − I δ g˜ (ω) + x¯ ˜ i i i i Mi(ω) :=   .  T T  δig˜i (ω) + x¯ δivi(ω) − γ˜

121 Since we are looking to minimizing d2, where d2 is an upper bound on square of the

T T radius of the ball B, we can write the constraint d2 ≥ x¯ x¯ − γ as d2 = x¯ x¯ − γ. So, the matrix Mi can be then written as

  τiHi − I τigi + x¯ Mi =   .  T T T  τigi + x¯ τivi + d2 − x¯ x¯

T Now, let Hi := ΞiΛiΞi be the spectral decomposition of Hi, where Λi := diag(λi1; ... ; λin),

T and let ui := Ξi (τigi + x¯). Then, for each i = 1, . . . , n1, we have

      ΞT 0 Ξ 0 τ Λ − I u ¯ i i i i i Mi :=   Mi   =   .  T   T   T T  0 1 0 1 ui τivi + d2 − x¯ x¯

¯ Consequently, Mi  0 if and only if Mi  0 for each i = 1, 2, . . . , n1. Now our formulation of the problem in SSOCP depends on the following lemma (see also [23]):

¯ Lemma 5.2.1. For each i = 1, 2, . . . , n1, Mi  0 if and only if τiλmin(Hi) ≥ 1 and

T T 2 x¯ x¯ ≤ d2 + τivi + 1 si, where si = (si1; ... ; sin), sij = uij/(τiλij − 1) for all j such that

τiλij > 1, and sij = 0 otherwise. ¯ Proof. For each i = 1, 2, . . . , n1, it is known that the matrix Mi  0 if and only if every ¯ principle minor of Mi is nonnegative. Since

τiλi1 − 1 0 ... 0

0 τiλi2 − 1 ... 0 det(τ Λ − I) = = Πn1 (τ λ − 1). i i . . . . j=1 j ij . . .. .

0 0 . . . τn1 λin1 − 1

¯ s It follows that Mi  0 if and only if Πj=1(τjλij − 1) ≥ 0, for all s = 1, . . . , n1 and ¯ ¯ ¯ det(Mi) ≥ 0. Thus, Mi  0 if and only if τjλij ≥ 1, for all j = 1, . . . , n1 and det(Mi) ≥ 0.

122 Notice that

k ! k ! ¯ Y T X 2 det(Mi) = (τiλij − 1) (τivi + d2 − x¯ x¯) − uij . j=1 j=1

¯ This means the inequality det(Mi) ≥ 0 strictly holds for each i ≤ j ≤ k such that ¯ τjλij = 1. Hence, det(Mi) ≥ 0 if and only if

T T T X 2 (τivi + d2 − x¯ x¯) − 1 si = (τivi + d2 − x¯ x¯) − (uij/(τjλij − 1)) ≥ 0.

τiλij >1

¯ T T Therefore, Mi  0 if and only if τiλmin(Hi) ≥ 1 and d2 ≥ x¯ x¯ − τivi + 1 si. 2 ¯ So far, we have shown that each constraint Mi  0 can be replaced by the constraints

T T τiλmin(Hi) ≥ 1, x¯ x¯ ≤ σ and σ ≤ d2 + τivi − 1 si. ˜ ˜T ˜ ˜T Similarly, for each i = 1, 2, . . . , n2, by letting Hi := Ξi ΛiΞi (the spectral decomposi- ˜ ˜T 2 ˜ tion of Hi), u˜i := Ξi (δig˜i + x¯), and s˜i := (˜si1;s ˜i2; ... ;s ˜in), wheres ˜ij :=u ˜ij/(δiλij − 1) ˜ for all j when δiλij > 1, ands ˜ij := 0 otherwise, then we can show that each constraint ˜ ˜ ˜ T Mi(ω)  0 can be replaced by the constraints δiλmin(Hi(ω)) ≥ 1, x¯ x¯ ≤ σ˜(ω), and ˜ T σ˜(ω) ≤ d2 + δivi(ω) − 1 s˜i(ω). ˜ Since we are minimizing d2 and d2, then for all j = 1, 2, . . . , n, we can relax the

2 definitions of sij ands ˜ij by replacing them by uij ≤ sij(τiλij − 1) for all i = 1, 2, . . . , n1 2 ˜ andu ˜ij ≤ s˜ij(δiλij − 1) for all i = 1, 2, . . . , n2, respectively. When the realizations of the random ellipsoids become available, if necessary, we determine λ˜ so that the new ball

n T T B˜ := {x ∈ R : x x − 2x¯ x + λ˜ ≤ 0} contains all the realizations of the random ellipsoid. This new ball B˜ has the same center x¯ √ as B but a larger radius,r ˜ := x¯Tx¯. We note thatr ˜2 −r2 = (x¯Tx¯−γ˜)−(x¯Tx¯−γ) = γ−γ˜,

123 and thus we introduce the constraint 0 ≤ γ − γ˜ ≤ z where z is an upper bound ofr ˜2 − r2. Letc ¯ > 0 denote the cost per unit of the Euclidean distance between the center of the ball B and the origin; let α > 0 be the cost per unit of the square of the radius of B; and let β > 0 be the cost per unit increase of the square of the radius if it becomes necessary after the realizations of the random ellipsoids are available. We now define the following decision variables

x := (d1; d2; x¯; γ; τ ) and y := (z;γ ˜; δ).

Then, by introducing the following unit cost vectors

c := (¯c; α; 0; 0; 0) and q := (β; 0; 0), and combining all of the above, we get the following SQRCP model:

min cTx + E[Q(x, ω)]

T s.t. ui = Ξi (τigi +x ¯), i = 1, 2, . . . , n1

2 uij ≤ sij(τiλij − 1), i = 1, 2, . . . , n1, j = 1, 2, . . . , n

T x¯ x¯ ≤ σ1

T σ1 ≤ d2 + τivi − 1 si, i = 1, 2, . . . , n1

τi ≥ 1/λmin(Hi), i = 1, 2, . . . , n1

T 2 x¯ x¯ ≤ d1, where Q(x, ω) is the minimum value of the problem

124 min qTy

˜T T s.t. u˜i(ω) = Ξi (ω) (δig˜i(ω) + x¯), i = 1, 2, . . . , n2 2 ˜ u˜ij(ω) ≤ s˜ij(ω)(δiλij(ω) − 1), i = 1, 2 . . . , n2, j = 1, 2, . . . , n

T x¯ x¯ ≤ σ˜2(ω) ˜ T σ˜2(ω) ≤ d2 + δivi(ω) − 1 s˜i(ω), i = 1, 2, . . . , n2 ˜ ˜ δi ≥ 1/λmin(Hi)(ω), i = 1, 2, . . . , n2 0 ≤ δ − δ˜ ≤ z, and Z E[Q(x, ω)] := Q(x, ω)P (dω). Ω

5.2.2 Structural optimization

Ben-Tal and Bendsøe in [12] and Nemirovski [13] consider the following problem from structural optimization. A structure of k linear elastic bars connects a set of p nodes. They assume the geometry (topology and lengths of bars) and the material are fixed. The goal is to size the bars, i.e., determine appropriate cross-sectional areas of the bars. For i = 1, 2, . . . , k, and j = 1, 2, . . . , p, we define the following decision variables and parameters:

th • fj := the external force applied on the j node,

th • dj := the (small) displacement of the j node resulting from the load force fj,

th • xi:= the cross-sectional area of the i bar,

th • xi:= the lower bound on the cross-sectional area of the i bar,

th • xi:= the upper bound on the cross-sectional area of the i bar,

th • li := the length of the i bar,

125 • v := the maximum allowed volume of the bars of the structure,

Pk p • G(x) := i=1 xiGi is the stiffness matrix, where the matrices Gi ∈ S , i = 1, 2, . . . , k depend only on fixed parameters (such as length of bars and material).

In the simplest version of the problem they consider one fixed set of externally applied nodal forces fj, j = 1, 2, . . . , p. Given this, the elastic stored energy within the structure is given by ε = f Td, which is a measure of the inverse of the stiffness of the structure. In view of the definition of the stiffness matrix G(x), we can also conclude that the following linear relationship between f and d: f = G(x) d.

The objective is to find the stiffest truss by minimizing ε subject to the inequality lT x ≤ v as a constraint on the total volume (or equivalently, weight) and the constraint x ≤ x ≤ x¯ as upper and lower bounds on the cross-sectional areas. For simplicity, we assume that x > 0 and G(x) 0, for all x > 0. In this case we can express the elastic stored energy in terms of the inverse of the stiffness matrix and the external applied nodal force as follows:

ε = f TG(x)−1f.

In summary, they consider the problem

min f T G(x)−1f s.t. x ≤ x ≤ x¯ lT x ≤ v,

126 which is equivalent to min s s.t. f T G(x)−1f ≤ s (5.2.1) x ≤ x ≤ x¯ lT x ≤ v.

The first inequality constraint in (5.2.1) is just fractional quadratic function inequality constraint and it can be formulated as a hyperbolic inequality. In §2 of [1] (see also §2 of [23]), the authors demonstrate that this inequality is satisfied if and only if there exists

ri ui ∈ R and ti ∈ R with ti > 0, i = 1, 2, . . . , k, such that

k X T T T Di ui = f, ui ui ≤ xiti, for i = 1, 2, . . . , k, and 1 t ≤ s, i=1

ri×n T where ri = rank(Gi) and the matrix Di ∈ R is the Cholesky factor of Gi (i.e., Di Di =

Gi) for each i = 1, 2, . . . , k. Using this result, problem (5.2.1) becomes the problem

min s

Pk T s.t. i=1 Di ui = f

T ui ui ≤ xiti, i = 1, . . . , k (5.2.2) 1T t ≤ s x ≤ x ≤ x¯ lT x ≤ v,

which includes only linear and hyperbolic constraints. The model (5.2.2) depends on a “simple” assumption that says that the external forces applied to the nodes are fixed. As more complicated versions, they consider multiple loading scenarios as well. In [49], Ariyawanza and Zhu consider the case that external forces applied to the nodes are random variables with known distribution functions and they formulate an SSDP model for this problem. In this subsection, we formulate an

127 SRQCP model for this problem under the assumption that some of the external forces applied to the nodes are fixed and the rest are random variables with known distribution

th ˜ functions. Let us denote to the external force applied on the j node by fj if it is ˜ fixed and by fj(ω) if it is random. Due to the changes in the environmental conditions (such as wind speed and temperature), we believe that the randomness of some of the external forces is much closer to the reality. Without loss of generality, let us assume that f(ω) = (f˜; f˜(ω)) where f˜(ω) depends on an underlying outcome ω in an event space Ω with a known probability function P . Accordingly, the displacement of the jth node resulting from the random forces and the elastic stored energy within the structure are also random. Then we have the following relations

ε˜ = f˜Td˜, f˜ = G(x) d˜, ε˜(ω) = f˜(ω)Td˜(ω), and f˜(ω) = G(x) d˜(ω).

Suppose we design a structure for a customer. The structure will be installed in an open environment. From past experience, the customer can provide us with sufficient in- formation so that we can model the external forces that will be applied to the nodes of the structure as random variables with known distribution functions. Given this information, we can formulate a model of this problem so that we can guarantee that our structure will be able to continue to function and stand up against the worst environmental conditions.

128 In summary, we solve the following SRQCP problem:

mins ˜ + E [Q(x, ω)] Pk T ˜ s.t. i=1 Di u˜i = f T ˜ u˜i u˜i ≤ xiti, i = 1, . . . , k 1T t˜ ≤ s˜ x ≤ x ≤ x¯ lT x ≤ v, where Q(x, ω) is the minimum value of the problem

min s˜

Pk T ˜ ˜ s.t. i=1 Di ˜ui(ω) = f(ω)

T ˜ ˜˜ui(ω) ˜˜ui(ω) ≤ xit˜i, i = 1, . . . , k 1T ˜t ≤ s,˜ and Z E[Q(x, ω)] := Q(x, ω)P (dω). Ω

129 Chapter 6

Related Open problems: Multi-Order Cone Programming Problems

In this chapter we introduce a new class of convex optimization problems that can be viewed as an extension of second-order cone programs. We present primal and dual forms of multi-order cone programs (MOCPs) in which we minimize a linear function over a Cartesian product of pth-order cones (we allow different p values for different cones in the product). We then indicate weak and strong duality relations for the problem. We also introduce mixed integer multi-order cone programs (MIMOCPs) to handle MOCPs with integer-valued variables, and two-stage stochastic multi-order cone programs (SMOCPs) with recourse to handle uncertainty in data defining (deterministic) MOCPs. We demon- strate how decision making problems associated with facility location problems lead to MOCP, MIMOCP, and SMOCP models. It is interesting to investigate other applicational settings leading to (deterministic, stochastic and mixed integer) MOCPs. Development of algorithms for such multi-order cone programs which in turn will benefit from a duality theory is equally interesting and remains one of the important open problems for future

130 research. We begin by introducing some notations we use in the sequel. Given p ≥ 1, the pth-order cone of dimension n is defined as

n n−1 Qp := {x = (x0; x¯) ∈ R × R : x0 ≥ kx¯kp}

n where k · kp denotes the p-norm. The cone Qp is regular (see, for example, [45]). As

n n special cases, when p = 2 we obtain Q2 := E+; the second-order cone of dimension n, and

n when p = 1 or ∞, Qp is a polyhedral cone.

hni n We write x hpi 0 to mean that x ∈ Qp . Given 1 ≤ pi ≤ ∞ for i = 1, 2, . . . , r. We hn ,n ,...,n i write x  1 2 r 0 to mean that x ∈ Qn1 × Qn2 × · · · × Qnr . It is immediately seen hp1,p2,...,pri p1 p2 pr that, for every vector x ∈ n where n = Pr n , x hn1,n2,...,nri 0 if and only if x is R i=1 i hp1,p2,...,pri partitioned conformally as x = (x ; x ; ... ; x ) and x hnii 0 for i = 1, 2, . . . , r. For 1 2 r i hpii simplicity, we write:

n hni •Q p as Qp and x hpi 0 as x hpi 0 when n is known from the context;

hn ,n ,...,n i •Q n1 × Qn2 × · · · × Qnr as Q and x  1 2 r 0 as x  0 when p1 p2 pr hp1,p2,...,pri hp1,p2,...,pri hp1,p2,...,pri

n1, n2, . . . , nr are known from the context;

• x hp, p, . . . , pi 0 as x rhpi 0.

| r times{z } n n The set of all interior points of Qp is denoted by int(Qp ) := {x = (x0; x¯) ∈ R × n−1 : x > kx¯k }. We write x 0 to mean that x ∈ int(Q ) := R 0 p hp1,p2,...,pri hp1,p2,...,pri

int(Qp1 ) × int(Qp2 ) × · · · × int(Qpr ).

6.1 Multi-order cone programming problems

Let r ≥ 1 be an integer, and p1, p2, . . . , pr are such that 1 ≤ pi ≤ ∞ for i = 1, 2, . . . , r. Pr Let m, n, n1, n2, . . . , nr be positive integers such that n = i=1 ni. Then we define an

131 MOCP in primal standard form as

min cTx (P ) s.t. A x = b x  0, hp1,p2,...,pri where A ∈ Rm×n, b ∈ Rm and c ∈ Rn constitute given data, and x ∈ Rn is the primal decision variable. We define a MOCP in dual standard form as

max bTy (D) s.t. ATy + z = c z  0, hq1,q2,...,qri

m n where y ∈ R and z ∈ R are the dual decision variables, and q1, q2, . . . , qr are integers

such that 1 ≤ qi ≤ ∞ for i = 1, 2, . . . , r.

If (P) and (D) are defined by the same data, and qi is conjugate to pi, in the sense

that 1/pi + 1/qi = 1 for i = 1, 2, . . . , r, then we can prove relations between (P) and (D) (see §3) to justify referring to (D) as the dual of (P) and vice versa. A pth-order cone programming (POCP) problem in primal standard form is

min cTx s.t. A x = b (6.1.1)

x rhpi 0,

Pr where m, n, n1, n2, . . . , nr are positive integers such that n = i=1 ni, p ∈ [1, ∞], A ∈ Rm×n, b ∈ Rm and c ∈ Rn constitute given data, x ∈ Rn is the primal variable. According to (D), the dual problem associated with POCP (6.1.1) is

132 max bTy s.t. ATy + z = c (6.1.2)

z rhqi 0, where y ∈ Rm and z ∈ Rn are the dual variables and q is conjugate to p. Clearly, second-order cone programs are a special case of POCPs which occurs when p = 2 in (6.1.1) (hence q = 2 in (6.1.2)).

Example 21. Norm minimization problems: In [1], Alizadeh and Goldfarb presented second-order cone programming formulations of three norm minimization problems where the norm is the Euclidean norm. In this subsec- tion we indicate extensions of these three problems where we use arbitrary p norms leading

ni−1 to MOCPs. Let vi = Aix + bi ∈ R , i = 1, 2, . . . , r. The following norm minimization problems can be cast as MOCPs:

Pr 1. Minimization of the sum of norms: The problem min i=1 kvi||pi can be formulated as Pr min i=1 ti

s.t. Aix + bi = vi, i = 1, 2, . . . , r (t ; v ; t ; v ; ... ; t ; v )  0. 1 1 2 2 r r hp1,p2,...,pri

2. Minimization of the maximum of norms: The problem min max1≤i≤r kvi||pi can be expressed as the MOCP problem

min t

s.t. Aix + bi = vi, i = 1, 2, . . . , r (t; v ; t; v ; ... ; t; v )  0. 1 2 r hp1,p2,...,pri

3. Minimization of the sum of the k largest norms: More generally, the problem of minimizing the sum of the k largest norms can also be cast as an MOCP. Let the

133 norms kv[1]kp[1] , kv[2]kp[2] ,..., kv[r]kp[r] be the norms kv1||p1 , kv2||p2 ,..., kvr||pr sorted Pr in nonincreasing order. Then the problem min i=1 kv[i]kp[i] can be formulated (see also [1] and [23] and the related references contained therein) as the MOCP problem

Pr min i=1 si + kt

s.t. Aix + bi = vi, i = 1, 2, . . . , r (s + t; v ; s + t; v ; ... ; s + t; v )  0 1 1 2 2 r r hp1,p2,...,pri

si ≥ 0, i = 1, 2, . . . , r.

6.2 Duality

Since MOCPs are a class of convex optimization problems, we can develop a duality theory for them. Here we indicate weak and strong duality for the pair (P, D) as justification for referring to them as a primal dual pair. It was shown by Faraut and Kor´anyi in [18, Chapter I.2] that the second-order cone is self-dual. We now prove the more general result that the the dual of the pth-order cone of dimension n is the qth-order cone of dimension n, where q is the conjugate to p.

∗ Lemma 6.2.1. Qp = Qq, where 1 ≤ p ≤ ∞ and q is the conjugate to p. More gen-

∗ erally, Qhp1,p2,...,pri = Qhq1,q2,...,qri, where 1 ≤ pi ≤ ∞ and qi is the conjugate to pi for i = 1, 2, . . . , r. Proof. The proof of the second part trivially follows from the first part and the fact

∗ ∗ ∗ ∗ that (K1 × K2 × · · · × Kr) = K1 × K2 × · · · × Kr . To prove the first part, we first ∗ ∗ prove that Qq ⊆ Qp . Let x = (x0; x¯) ∈ Qq, we show that x ∈ Qp by verifying that

T T T x y¯ ≥ 0 for any y ∈ Qp. So let y = (y0; y¯) ∈ Qp. Then x y = x0 y0 + x¯ y¯ ≥

T T T kx¯kqky¯kp + x¯ y¯ ≥ |x¯ y¯| + x¯ y¯ ≥ 0, where the first inequality follows from the fact that

∗ x ∈ Qq and y ∈ Qp and the second one from H¨older’sinequality. Now we show Qp ⊆ Qq.

∗ Let y = (y0; y¯) ∈ Qp , we show that y ∈ Qq by verifying that y0 ≥ ky¯kq. This is trivial if

134 p/q p/q p/q y¯ = 0 or p = ∞. If y¯ 6= 0 and 1 ≤ p < ∞, let u := (y1 ; y2 ; ... ; yn−1 ) and consider x := (kukp; −u) ∈ Qp. Then by using H¨older’sinequality, where the equality is attained,

T T we obtain 0 ≤ x y = kukp y0 − u y¯ = kukp y0 − kukpky¯kq = kukp (y0 − ky¯kq). This gives that y0 ≥ ky¯kq. 2

th ∗∗ From this lemma we deduce that the p -order cone is reflexive, i.e., Qp = Qp, and

more generally, Qhp1,p2,...,pri is also reflexive. On the basis of this fact, it is natural to infer that the dual of the dual is the primal. In view of the above lemma, problem (D) can be derived from (P) through the usual Lagrangian approach. The Lagrangian function for (P) is

L(x, λ, ν) = cTx − λT(A x − b) − νTx.

The dual objective is

q(λ, ν) := inf L(x, λ, ν) = inf (c − ATλ − ν)Tx + λTb. x x

In fact, we may call the constraint x  0 as the “nonnegativity of x”, but, hp1,p2,...,pri with respect to the multi-order cone Qhp1,p2,...,pri. Note that the Lagrange multiplier ν corresponding to the inequality constraint x  0 is restricted to be nonnegative hp1,p2,...,pri with respect to the dual of Q (i.e., ν  0), whereas the Lagrange mul- hp1,p2,...,pri hq1,q2,··· ,qri tiplier λ corresponding to the equality constraint Ax − b = 0 is unrestricted. Hence the dual problem is obtained by maximizing q(λ, ν) subject to ν  0. hq1,q2,...,qri If c − ATλ − ν 6= 0, the infimum is clearly −∞. So we can exclude λ for which c − ATλ − ν 6= 0. When c − ATλ − ν = 0, the dual objective function is simply λTb.

135 Primal Minimum Maximum Dual C vector:  vector:  V hp1,p2,...,pri hq1,q2,...,qri O vector:  vector:  A hp1,p2,...,pri hq1,q2,...,qri N vector or scalar: ≥ vector or scalar: ≥ R S vector or scalar: ≤ vector or scalar: ≤ B T vector or scalar: = vector or scalar: free L V vector:  vector:  C hp1,p2,...,pri hq1,q2,...,qri A vector:  vector:  O hp1,p2,...,pri hq1,q2,...,qri R vector or scalar: ≥ vector or scalar: ≤ N B vector or scalar: ≤ vector or scalar: ≥ S L vector or scalar: free vector or scalar: = T

Table 6.1: Correspondence rules between primal and dual MOCPs.

Hence, we can write the dual problem as follows:

max bTλ s.t. ATλ + ν = c (6.2.1) ν  0. hq1,q2,...,qri

Replacing λ by x and ν by z in (6.2.1) we get (D). In general, MOCPs can be written in a variety of forms different from the standard form (P, D). The situation in MOCPs is similar to that for linear programs; any MOCP problem can be written in the standard form. However, if we consider MOCPs in other forms, then it is more convenient to apply the duality rules directly. Table 6.1 is a summary of these rules. This table is a generalization of a similar table in [14, Section 4.2]. For instance, using this table, the dual of (P) is the problem

max bTλ s.t. ATλ  c, hq1,q2,...,qri which is equivalent to problem (6.2.1) where ATλ  c means that c−ATλ  0. hq1,q2,...,qri hq1,q2,...,qri

136 Using Lemma 6.2.1, we can prove the following weak duality property.

Theorem 6.2.1. (Weak duality) If x is any primal feasible solution of (P) and (y, z) is any feasible solution of (D), then the duality gap cTx − bTy = xTz ≥ 0. Proof. Note that cTx − bTy = (ATy + z)Tx − bTy = yTAx + zTx − yTb = yT(Ax −

T T ∗ b) + z x = x z. Since x ∈ Qhp1,p2,...,pri and z ∈ Qhq1,q2,...,qri = Qhp1,p2,...,pri , we conclude that xTz ≥ 0. 2

We now give conditions for strong duality to hold. We say that problem (P) is strictly

feasible if there exists a primal feasible point xˆ such that xˆ hp1,p2,...,pri 0. In the remaining part of this section, we assume that the m rows of the matrix A are linearly independent. Using the Karush-Kuhn-Tucker (KKT) conditions, we state and prove the following strong duality result.

Theorem 6.2.2. (Strong duality I) Consider the primal–dual pair (P, D). If (P) is strictly feasible and solvable with a solution x, then (D) is solvable and the optimal values of (P) and (D) are equal. Proof. By the assumptions of the theorem, x is an optimal solution of (P) where we can apply the KKT conditions. This implies that there are Lagrange multiplier vectors λ and ν such that (x, λ, ν) satisfies

A x = b, x  0, hp1,p2,...,pri ATλ + ν = c, ν  0, hq1,q2,...,qri T xi νi = 0, for i = 1, 2, . . . , r.

This implies that (λ, ν) is feasible for the dual problem (D). Let (y, z) be any feasible solution of (D), then we have that bTy ≤ cTx = xTν + bTλ = bTλ, where we used the weak duality to obtain the inequality and the complementary slackness to obtain the last equality. Thus, (λ, ν) is an optimal solution of (D) and cTx = bTλ as desired. 2

137 Note that the result in Theorem 6.2.1 is symmetric between (P) and (D). The following strong duality result can also be obtained by applying the duality relations [30, Theorem 4.2.1] to our problem formulation,

Theorem 6.2.3. (Strong duality II) Consider the primal–dual pair (P, D). If both (P) and (D) have strictly feasible solutions, then they both have optimal solutions x∗ and (y∗, z∗), respectively, and p∗ := cTx∗ = d∗ := bTy∗ (i.e., x∗Tz∗ = 0 (complementary slackness)).

From the above results, we get the following corollary.

Corollary 6.2.1. (Optimality conditions) Assume that both (P) and (D) are strictly feasible, then (x, y, z) ∈ Rn+m+n is a pair of optimal solutions if and only if

A x = b, x  0, hp1,p2,...,pri ATy + z = c, z  0, hq1,q2,...,qri T xi zi = 0, for i = 1, 2, . . . , r.

6.3 Multi-oder cone programming problems over in-

tegers

In this section we introduce two important related problems that result when decision variables in an MOCP can only take integer values. Consider the MOCP problem (P). If we require an additional constraint that a subset of the variables have to attain 0-1 values, then we are interested in optimization problem of the form

138 min cTx s.t. Ax = b x 0 hp1,p2,...,pri

xk ∈ {0, 1}, k ∈ Γ,

n where Γ ⊆ {1, 2, . . . , n}, the decision variable x ∈ R has some of its components xk

(k ∈ Γ) with integer values and bounded by αk, βk ∈ R. This class of optimization problems may be termed as 0-1 multi-order cone programs (0-1MOCPs). A more general and interesting problem when in a MOCP some variables can only take integer values. If we are given the same data A, b, and c as in (P), then we are interested in the problem of the form

min cTx s.t. Ax = b (6.3.1) x 0 hp1,p2,...,pri T xk ∈ [αk, βk] Z, k ∈ Γ,

n where Γ ⊆ {1, 2, . . . , n}, the decision variable x ∈ R has some of its components xk

(k ∈ Γ) with integer values and bounded by αk, βk ∈ R. This class of optimization problems may be termed as mixed integer multi-order cone programs (MIMOCPs).

6.4 Multi-oder cone programming problems under

uncertainty

In this section we define two-stage stochastic multi-order cone programs (SMCOPs) with

recourse to handle uncertainty in data defining (deterministic) MOCPs. Let r1, r2 ≥ 1 be

integers. For i = 1, 2, . . . , r1 and j = 1, 2, . . . , r2, let p1i, p2j ∈ [1, ∞] and m1, m2, n1, n2, n1i,

139 Pr1 Pr2 n2j be positive integers such that n1 = i=1 n1i and n2 = i=1 n2j. An SMOCP with re- course in primal standard form is defined based on deterministic data A ∈ Rm1×n1 , b ∈ Rm1 and c ∈ Rn1 and random data T ∈ Rm2×n1 ,W ∈ Rm2×n2 , h ∈ Rm2 and d ∈ Rn2 whose realizations depend on an underlying outcome ω in an event space Ω with a known prob- ability function P . Given this data, an SMOCP with recourse in primal standard form is min cTx + E [Q(x, ω)] s.t. Ax = b x  0, hp11,p12,...,p1r1 i where x ∈ Rn1 is the first-stage decision variable and Q(x, ω) is the minimum value of the problem

min d(ω)Ty s.t. W (ω)y = h(ω) − T (w)x y  0, hp21,p22,...,p2r2 i

where y ∈ Rn2 is the second-stage variable, and

Z E[Q(x, ω)] := Q(x, ω)P (dω). Ω

Two-stage stochastic pth(second)-order cone programs with recourse are a special case

of SMOCPs with p1i = p2j = p ≥ 1 (p1i = p2j = 2) for all i = 1, 2, . . . , r1 and j =

1, 2, . . . , r2.

6.5 An application

Our application is four versions the FLP (see Subsection 5.1.1). For these four versions we present problem descriptions leading to a MOCP model, a 0-1MOCP model, an MIMOCP

140 model, and an SMOCP model. As we mentioned in Subsection 5.1.1, FLPs can be classified based on the distance measure used in the model between the facilities. If we use the Euclidean distance then these problems are called Euclidean facility location problems (EFLPs), if we use the rec-

tilinear distance (also known as L1 distance, city block distance, or Manhattan distance) then these problems are called rectilinear facility location problems (RFLPs). Further- more, in some applications we use both the Euclidean and the rectilinear distances (based on the relationships between the pairs of facilities) as the distance measures used in the model between the facilities to get a mixed of EFLPs and RFLPs that we refer to as Euclidean-rectilinear facility location problems (ERFLPs). Another way of classification this problem is based on where we can place the new facilities in the solution space. When the new facilities can be placed any place in solution space, the problem is called a continuous facility location problem (CFLP), but usually the decision maker needs the new facilities to be placed at specific locations (called nodes) and not in any place in the solution space. In this case the problem is called a discrete facility location problem (DFLP). Each one of the next subsections is devoted to a version of ERFLPs. More specifi- cally, we consider (deterministic) continuous Euclidean-rectilinear facility location prob- lems (CERFLPs) which leads to an MOCP model, discrete Euclidean-rectilinear facility location problems (DERFLPs) which leads to a 0-1MOCP model, ERFLPs with integral- ity constraints which leads to an MIMOCP model, and stochastic continuous Euclidean- rectilinear facility location problems (stochastic CERFLPs) which leads to an SMOCP model.

141 6.5.1 CERFLPs—An MOCP model

In single ERFLPs, we are interested in choosing a location to build a new facility among existing facilities so that this location minimizes the sum of weighted (either Euclidean or rectilinear) distances to all existing facilities. Assume that we are given r + s existing facilities represented by the fixed points

n a1, a2,..., ar, ar+1, ar+2,..., ar+s in R , and we plan to place a new facility represented by x so that we minimize the weighted sum of the Euclidean distances between x and

each of the points a1, a2,..., ar and the weighted sum of the rectilinear distances between

x and each of the points ar+1, ar+2,..., ar+s. This leads us to the problem

Pr Pr+s min i=1 wi kx − aik2 + i=r+1 wi kx − aik1

or, alternatively, to the problem

Pr+s min i=1 wi ti

s.t. (t1; x − a1; ... ; tr; x − ar) rh2i 0

(tr+1; x − ar+1; ... ; tr+s; x − ar+s) sh1i 0,

where wi is the weight associated with the ith existing facility and the new facility for i = 1, 2, . . . , r + s.

n In multiple ERFLPs we add m new facilities, namely x1, x2,..., xm ∈ R , instead of adding only one. We have two cases depending whether or not there is an interaction among the new facilities in the underlying model. If there is no interaction between the new facilities, we are just concerned in minimizing the weighted sums of the distance between each one of the new facilities and each one of the fixed facilities. In other words, we solve the following MOCP model:

142 Pm Pr+s min j=1 i=1 wij tij

s.t. (t1j; xj − a1; ... ; trj; xj − ar) rh2i 0, j = 1, 2, . . . , m (6.5.1)

(t(r+1)j; xj − ar+1; ... ; t(r+s)j; xj − ar+s) sh1i 0, j = 1, 2, . . . , m,

where wij is the weight associated with the ith existing facility and the jth new facility for j = 1, 2, . . . , m and i = 1, 2, . . . , r + s. If interaction exists among the new facilities, then, in addition to the above require- ments, we need to minimize the sum of the (either Euclidean or rectilinear) distances between each pair of the new facilities. Let 1 ≤ l ≤ m and assume that we are required to minimize the weighted sum of the Euclidean distances between each pair of the new

facilities x1, x2,..., xl and the weighted sum of the rectilinear distances between each

pair of the new facilities xl+1, xl+2,..., xm. In this case, we are interested in a model of the form:

Pm Pr+s Pm Pj−1 ˆ min j=1 i=1 wij tij + j=2 j0=1 wˆjj0 tjj0

s.t. (t1j; xj − a1; ... ; trj; xj − ar) rh2i 0, j = 1, 2, . . . , m

(t(r+1)j; xj − ar+1; ... ; t(r+s)j; xj − ar+s) sh1i 0, j = 1, 2, . . . , m

(tˆj(j+1); xj − xj+1; ... ; tˆjl; xj − xl) (l−j)h2i 0, j = 1, 2, . . . , l − 1

(tˆj(j+1); xj − xj+1; ... ; tˆjm; xj − xm) (m−j)h1i 0, j = l + 1, 2, . . . , m − 1, (6.5.2)

0 0 wherew ˆjj0 is the weight associated with the new facilities j and j for j = 1, 2, . . . , j − 1 and j = 2, 3, . . . , m.

6.5.2 DERFLPs—A 0-1MOCP model

We consider the discrete version of the problem by assuming that the new facilities x1, x2,..., xm need to be placed at specific locations and not in any place in 2- or 3-

143 n (or higher) dimensional space. Let the points v1, v2,..., vk ∈ R represent these specific

locations where k ≥ m. So, we add the constraint xi ∈ {v1, v2,..., vk} for i = 1, 2, . . . , m. Clearly, for i = 1, 2, . . . , m, the above constraint can be replaced by the following linear and binary constraints:

xi = v1 yi1 + v2 yi2 + ··· + vk yik,

yi1 + yi2 + ··· + yik = 1, and

k yi = (yi1; yi2; ... ; yik) ∈ {0, 1} .

We also assume that we cannot place more than one facility at each location. Conse- quently, we add the following constaints:

(1; y1l; y2l; ... ; yml) h1i 0, for l = 1, 2, . . . , k.

If there is no interaction between the new facilities, then the MOCP model (6.5.1) becomes the following 0-1MOCP model:

Pm Pr+s min j=1 i=1 wij tij

s.t. (t1j; xj − a1; ... ; trj; xj − ar) rh2i 0, j = 1, 2, . . . , m

(t(r+1)j; xj − ar+1; ... ; t(r+s)j; xj − ar+s) sh1i 0, j = 1, 2, . . . , m

xi = v1 yi1 + v2 yi2 + ··· + vk yik, i = 1, 2, . . . , m

(1; y1l; y2l; ... ; yml) h1i 0, for l = 1, 2, . . . , k

T k 1 yi = 1, yi ∈ {0, 1} , i = 1, 2, . . . , m.

If interaction exists among the new facilities, then the MOCP model (6.5.2) becomes the

144 following 0-1MOCP model:

Pm Pr+s Pm Pj−1 ˆ min j=1 i=1 wij tij + j=2 j0=1 wˆjj0 tjj0

s.t. (t1j; xj − a1; ... ; trj; xj − ar) rh2i 0, j = 1, 2, . . . , m

(t(r+1)j; xj − ar+1; ... ; t(r+s)j; xj − ar+s) sh1i 0, j = 1, 2, . . . , m

(tˆj(j+1); xj − xj+1; ... ; tˆjl; xj − xl) (l−j)h2i 0, j = 1, 2, . . . , l − 1

(tˆj(j+1); xj − xj+1; ... ; tˆjm; xj − xm) (m−j)h1i 0, j = l + 1, 2, . . . , m − 1

xi = v1 yi1 + v2 yi2 + ··· + vk yik, i = 1, 2, . . . , m

(1; y1l; y2l; ... ; yml) h1i 0, for l = 1, 2, . . . , k

T k 1 yi = 1, yi ∈ {0, 1} , i = 1, 2, . . . , m.

For l = 1, 2, . . . , k, let zl = 1 if the location vi is chosen, and 0 otherwise. Then, we can go further, and consider more assumptions: Let k1, k2, k3, k4 ∈ [1, k] be integers such that

k1 ≤ k2 and k3 ≤ k4. If we must choose at most k1 of the locations v1, v2,..., vk2 , then we impose the constraints:

k (k1; z1; z2; ... ; zk2) h1i 0, and z ∈ {0, 1} .

If we must choose at most k1 of the locations v1, v2,..., vk2 , or at most k3 of the locations v1, v2,..., vk4 , then we impose the constraints:

k (k1f; z1; z2; ... ; zk2 ) h1i 0, (k3(1 − f); z1; z2; ... ; zk4 ) h1i 0, z ∈ {0, 1} , and f ∈ {0, 1}.

6.5.3 ERFLPs with integrality constraints—An MIMOCP model

In some problems we may need the locations to have integer-valued coordinates. In most cities, streets are laid out on a grid, so that city is subdivided into small numbered blocks that are square or rectangular. In this case, usually the decision maker needs the new

145 facility to be placed at the corners of the city blocks. Thus, for each i ∈ ∆ ⊂ {1, 2, . . . , m},

n let us assume that the variable xi lies in the hyperrectangle Ξi ≡ {xi : ζi ≤ xi ≤ ηi, ζi ∈ n n n T n R , ηi ∈ R } and has to be integer-valued, i.e. xi must be in the grid Ξi Z . Thus, if there is no interaction between the new facilities, then instead of solving the MOCP model (6.5.1), we solve the following MIMOCP model:

Pm Pr+s Pm Pj−1 ˆ min j=1 i=1 wij tij + j=2 j0=1 wˆjj0 tjj0

s.t. (t1j; xj − a1; ... ; trj; xj − ar) rh2i 0, j = 1, 2, . . . , m

(t(r+1)j; xj − ar+1; ... ; t(r+s)j; xj − ar+s) sh1i 0, j = 1, 2, . . . , m

n T n xk ∈ Ξk Z , k ∈ ∆.

If interaction exists among the new facilities, then instead of solving the MOCP model (6.5.2), we solve the following MIMOCP model:

Pm Pr+s Pm Pj−1 ˆ min j=1 i=1 wij tij + j=2 j0=1 wˆjj0 tjj0

s.t. (t1j; xj − a1; ... ; trj; xj − ar) rh2i 0, j = 1, 2, . . . , m

(t(r+1)j; xj − ar+1; ... ; t(r+s)j; xj − ar+s) sh1i 0, j = 1, 2, . . . , m

(tˆj(j+1); xj − xj+1; ... ; tˆjl; xj − xl) (l−j)h2i 0, j = 1, 2, . . . , l − 1

(tˆj(j+1); xj − xj+1; ... ; tˆjm; xj − xm) (m−j)h1i 0, j = l + 1, 2, . . . , m − 1

n T n xk ∈ Ξk Z , k ∈ ∆.

6.5.4 Stochastic CERFLPs—An SMOCP model

Before we describe the stochastic version of this generic application, we indicate a more concrete version of it. Assume that we have a new growing city with many suburbs and we want to build a hospital for treating the residents of this city. Some people live in the city at the present time. As the city expands, many houses in new suburbs need to be built and the locations of these suburbs will be known in the future in different parts of the city. Our goal is to find the best location of this hospital so that it can serve the

146 current suburbs and the new ones. This location must be determined at the current time and before information about the locations of the new suburbs become available. For those houses that are close enough to the location of the hospital, we use the rectilinear distance as a distance measure between them and the hospital, while for the new suburbs that that will be located far way from the location of the hospital, we use the Euclidean distance as a distance measure between them and the hospital. See Figure 6.1.

Figure 6.1: A more concrete version of the stochastic CERFLP: A new growing city with many expected building houses in different possible sides of the city.

n Generally speaking, let a1, a2,..., ar1 , ar1+1, ar1+2,..., ar1+s1 be fixed points in R representing the coordinates of r1 +s1 existing fixed facilities and a˜1(ω), a˜2(ω),..., a˜r2 (ω),

n a˜r2+1(ω), a˜r2+2(ω),..., a˜r2+s2 (ω) be random points in R representing the coordinates of r2 + s2 random facilities whose realizations depends on an underlying outcome ω in an event space Ω with a known probability function P .

Suppose that at present we do not know the realizations of r2 + s2 random facilities, and that at some point in time in future the realizations of these r2 + s2 random facilities

147 become known.

n Our goal is to locate m new facilities x1, x2,..., xm ∈ R that minimize the the following sums:

• the weighted sums of the Euclidean distance between each one of the new facilities

and each one of the fixed facilities a1, a2,..., ar1 ;

• the weighted sums of the rectilinear distance between each one of the new facilities

and each one of the fixed facilities ar1+1, ar1+2,..., ar1+s1 ;

• the expected weighted sums of the Euclidean distance between each one of the new

facilities and each one of the random facilities a˜1(ω), a˜2(ω),..., a˜r2 (ω);

• the expected weighted sums of the rectilinear distance between each one of the new

facilities and each one of the random facilities a˜r2+1(ω), a˜r2+2(ω),..., a˜r2+s2 (ω).

Note that this decision needs to be made before the realizations of the r2 + s2 random facilities become available. If there is no interaction between the new facilities, then the (deterministic) MOCP model (6.5.1) becomes the following SMOCP model:

Pm Pr1+s1 min j=1 i=1 wij tij + E [Q(x1; ... ; xm, ω)]

s.t. (t1j; xj − a1; ... ; tr1j; xj − ar1 ) r1h2i 0, j = 1, 2, . . . , m

(t(r1+1)j; xj − ar1+1; ... ; t(r1+s1)j; xj − ar1+s1 ) s1h1i 0, j = 1, 2, . . . , m,

where Q(x1; ... ; xm, ω) is the minimum value of the problem

Pm Pr2+s2 ˜ min j=1 i=1 w˜ij(ω) tij ˜ ˜ s.t. (t1j; xj − a˜1(ω); ... ; tr2j; xj − a˜r2 (ω)) r2h2i 0, j = 1, 2, . . . , m ˜ ˜ (t(r2+1)j; xj − a˜(r2+1)(ω); ... ; t(r2+s2)j; xj − a˜(r2+s2)(ω)) s2h1i 0, j = 1, 2, . . . , m,

148 and Z E[Q(x1; ... ; xm, ω)] := Q(x1; ... ; xm, ω)P (dω), Ω

where wij is the weight associated with the ith existing facility and the jth new facility for

j = 1, 2, . . . , m and i = 1, 2, . . . , r1 + s1, andw ˜ij(ω) is the weight associated with the ith

random existing facility and the jth new facility for j = 1, 2, . . . , m and i = 1, 2, . . . , r2+s2. If interaction exists among the new facilities, then the (deterministic) MOCP model (6.5.2) becomes the following SMOCP model:

Pm Pr1+s1 Pm Pj−1 ˆ min j=1 i=1 wij tij + j=2 j0=1 wˆjj0 tjj0 + E [Q(x1; ... ; xm, ω)]

s.t. (t1j; xj − a1; ... ; tr1j; xj − ar1 ) r1h2i 0, j = 1, 2, . . . , m

(t(r1+1)j; xj − ar1+1; ... ; t(r1+s1)j; xj − ar1+s1 ) s1h1i 0, j = 1, 2, . . . , m

(tˆj(j+1); xj − xj+1; ... ; tˆjl; xj − xl) (l−j)h2i 0, j = 1, 2, . . . , l − 1

(tˆj(j+1); xj − xj+1; ... ; tˆjm; xj − xm) (m−j)h1i 0, j = l + 1, 2, . . . , m − 1,

where Q(x1; ... ; xm, ω) is the minimum value of the problem

Pm Pr2+s2 ˜ Pm Pj−1 ˆ min j=1 i=1 w˜ij(ω) tij + j=2 j0=1 wˆjj0 tjj0 ˜ ˜ s.t. (t1j; xj − a˜1(ω); ... ; tr2j; xj − a˜r2 (ω)) r2h2i 0, j = 1, 2, . . . , m ˜ ˜ (t(r2+1)j; xj − a˜(r2+1)(ω); ... ; t(r2+s2)j; xj − a˜(r2+s2)(ω)) s2h1i 0, j = 1, 2, . . . , m

(tˆj(j+1); xj − xj+1; ... ; tˆjl; xj − xl) (l−j)h2i 0, j = 1, 2, . . . , l − 1

(tˆj(j+1); xj − xj+1; ... ; tˆjm; xj − xm) (m−j)h1i 0, j = l + 1, 2, . . . , m − 1,

and Z E[Q(x1; ... ; xm, ω)] := Q(x1; ... ; xm, ω)P (dω), Ω

0 0 wherew ˆjj0 is the weight associated with the new facilities j and j for j = 1, 2, . . . , j − 1 and j = 2, 3, . . . , m.

149 Chapter 7

Conclusion

In this dissertation we have introduced the two-stage stochastic symmetric programming (SSP) with recourse. While one can take the view point that SSPs are a way of dealing with uncertainty in data defining deterministic symmetric programs, it is also possible to take the dual viewpoint that SSPs are a way of allowing any symmetric cone in place of the nonnegative orthant cone in stochastic linear programs, or the positive semidefinite cone in stochastic semidefinite programs. It is also seen that the SSP problem includes some important general classes of optimization problems (Problems 1-6) as special cases. We have presented a logarithmic barrier interior point algorithm for solving SSPs using Bender’s decomposition. We have proved the convergence results by showing that the logarithmic barrier associated with the recourse function is a strongly self-concordant barrier on the first-stage solutions. We have described and analyzed short- and long- step variance of the algorithm that follow the primal central trajectory of the first-stage

problem. We concluded that, for given two symmetric cones with ranks r1 and r2 in the first- and second-stage problems, respectively, and for K number of realizations, we need √ 0 at most O( r1 + Kr2 ln(µ /)) Newton iterations in the short-step class of the algorithm to follow the first-stage central path from a starting value of the barrier µ0 to a terminating

0 value . We have also shown that at most O((r1 + Kr2) ln(µ /)) Newton iterations for

150 this recentering in the long-step class of the algorithm. An alternative to the logarithmic barrier is the volumetric barrier of Vaidya [40]. In this dissertation, we have also presented a class of volumetric barrier decomposition algorithms for SSPs. We have proved the convergence results by showing that the volumetric barrier associated with the recourse function is a strongly self-concordant barrier on the first- stage solutions. We have also described and analyzed short- and long-step variant of the algorithm that follow the primal central trajectory of the first-stage problem. For

given two symmetric cones with ranks r1 and r2 and dimensions n1 and n2 in the first- and second-stage problems, respectively, and for K number of realizations, we have seen that in order to follow the first-stage central path from a starting value of the barrier µ0

p 0 to a terminating value , we need at most O( (1 + K)(m1c1 + m2c2) ln(µ /)) Newton iterations in the short-step class of the algorithm. We have also shown that at most

0 O((1+K)(m1c1+m2c2) ln(µ /)) Newton iterations in the long-step class of the algorithm. Chen and Mehrotra [16] have proposed a prototype primal interior point decomposi- tion algorithm for the two-stage stochastic convex optimization problem. Since stochastic symmetric programming is a subclass of general stochastic convex optimization, our prob- lem can be solved by using their prototype algorithm, but doing this is not a good idea, because solving SSPs as convex programs without exploiting the special structure of the symmetric cone is not advisable both theoretically and practically. Therefore, a separate study of SSP algorithms is warranted. Concerning applications, we have presented four application areas that lead to SSPs when the underlying symmetric cones are second-order cones and rotated quadratic cones. This has been simply accomplished by applying a relaxation on assumptions that include deterministic data to include random data. Problems 1-6 in Chapter 2 are all important special cases of the SSP problem. So, it is interesting to investigate other applicational settings leading to these general problems, such as the ones obtained when the underlying symmetric cones are real symmetric or

151 complex Hermitian positive semidefinite matrices. Concerning implementation of the proposed SSP algorithms, it will be interesting to implement these methods in the future to determine their practical effects. We there- fore refer ourselves to a paper by Mehrotra and Ozevin¨ [26] where they develop efficient practical implementations of their SSDP algorithm proposed in [25]. We have ended the dissertation by introducing deterministic and stochastic multi- order cone programs (MOCPs) as new conic optimization problems and to be natural extensions of deterministic and stochastic second-order cone programs. We presented an application leading to deterministic, 0-1, integer, and stochastic multi-order cone pro- grams. The MOCP problems are over non-symmetric cones, hence we left such problems as “logarithmically unsolved” open problems so as to contemplate for our future research in conic programming.

152 Bibliography

[1] F. Alizadeh and D. Goldfarb. Second-order cone programming. Math. Program., Ser. B 95:3–51, 2003.

[2] B. Alzalg. Stochastic second-order cone programming: Application models, Submit- ted.

[3] B. Alzalg. Stochastic symmetric optimization. Submitted.

[4] B. Alzalg and K. A. Ariyawansa. A class of polynomial volumetric barrier decompo- sition algorithms for stochastic symmetric programming, Submitted.

[5] K. M. Anstreicher. On Vaidya’s volumetric cutting plane method for convex pro- gramming. Mathematics of Operations Research, 22(1):63–89, 1997.

[6] K. M. Anstreicher. Volumetric path following algorithms for linear programming. Mathematical Programming, 76(1B):245–263, 1997.

[7] K. M. Anstreicher. The volumetric barrier for semidefinite programming. Mathemat- ics of Operations Research, 25(3):365–380, 2000.

[8] K. A. Ariyawansa and A. J. Felt. On a new collection of stochastic linear programming test problems. Informs Journal on Computing, 16(3):291–299, 2004.

[9] K. A. Ariyawansa and Y. Zhu. Stochastic semidefinite programming: A new paradigm for stochastic optimization. 4OR—The Quarterly Journal of the Belgian, French and Italian OR Societies, 4(3):65–79, 2006.

[10] K. A. Ariyawansa and Y. Zhu. A class of polynomial volumetric barrier decomposition algorithms for stochastic semidefinite programming. Mathematics of computation, 80:1639–1661, 2011.

[11] V. S. Bawa, S. J. Brown, and R. W. Klein. Estimation risk and optimal portfolio choice: A survey, Proceedings of the American Statistical Association, 53–58, 1979.

[12] A. Ben-Tal and M. P. Bendsøe. A new method for optimal truss topology design. SIAM J. Optimization, 3(2):322–358, 1993.

153 [13] A. Ben-Tal and A. Nemirovski. Interior point polynomial-time method for truss topology design, Technical report, Faculty of Industrial Engineering and Mangement, Technion, 1992.

[14] D. Bertsimas and J. Tsitsiklis. Introduction to linear optimization, Athena Scientific, 1997.

[15] J. R. Birge. Stochastic programming computation and applications. INFORMS Jour- nal on Computing, 9:111–133, 1997.

[16] M. Chen and S. Mehrotra. Self-concordance tree and decomposition based interior point methods for the two stage stochastic convex optimization problem. Technical report, Northwestern University, 2007.

[17] M. A. H. Dempster. Stochastic Programming. Academic Press, London, UK, 1980.

[18] J. Faraut and A. Kor´anyi. Analysis on Symmetric Cones. Oxford University Press, Oxford, UK, 1994.

[19] A. V. Fiacco and G. P. McCormick. Nonlinear Programming: Sequential Uncon- strained Minimization Techniques. John Wiley, New York, NY, 1968.

[20] C. Helmberg, F. Rendl, R. J. Vanderbei, H. Wolkowicz. An interior-point methods for stochastic semidefinite programming. SIAM J. of Optimization, 6:342–361, 1996.

[21] P. L. Jiang. Polynomial Cutting Plane Algorithms for Stochastic Programming and Related Problems. Ph.D. dissertation, Department of Pure and Applied Mathematics, Washington State University, Pullman, WA, 1997.

[22] M. Kojima, S. Shindoh, S. Hara. Interior-point methods for the monotone linear complementarity problem in symmetric matrices. SIAM J. of Optimization, 7(9):86– 125, 1997.

[23] M. S. Lobo, L. Vandenberghe, S. Boyd, and H. Lebret. Applications of second-order cone programming. Linear Algebra Appl., 284:193–228, 1998.

[24] F. Maggioni, F.A. Potra, M.I. Bertocchi, and E. Allevi. Stochastic second-order cone programming in mobile ad hoc networks. J. Optim. Theory Appl., 143:309–328, 2009.

[25] S. Mehrotra and M. G. Ozevin.¨ Decomposition-based interior point methods for two- stage stochastic semidefinite programming. SIAM J. of Optimization, 18(1):206–222, 2007.

[26] S. Mehrotra, and M. Ozevin.¨ On the implementation of interior point decomposition algorithms for two-stage stochastic conic programs. SIAM J. Optim., 19(4):1846– 1880, 2008.

154 [27] S. Mehrotra, and M. Ozevin.¨ Decomposition based interior point methods for two- stage stochastic convex quadratic programs with recourse. Operations Research, 54(4):964–974, 2009.

[28] R. D. Monteiro. Primal-dual path-following algorithms for semidefinite programming. SIAM J. Optim., 7:663–678, 1997.

[29] Yu. E. Nesterov and A. S. Nemirovskii. Conic formulation of a convex programming problem and duality. Optimization Methods and Software, 1(2):95–115 , 1992.

[30] Yu. E. Nesterov and A. S. Nemirovski. Interior Point Polynomial Algorithms in Convex Programming., SIAM Publications. SIAM, Philadelphia, PA, USA, 1994.

[31] A. Pr´ekopa. On probabilistic constrained programming. Proceedings of the Princeton Symposium on Mathematical Programming. 113–138, Princeton Univ. Press, Prince- ton, NJ, USA, 1970.

[32] A. Pr´ekopa. Stochastic Programming. Kluwer Academic Publishers, Boston, MA, USA, 1995.

[33] A. Pr´ekopa. The use of discrete moment bounds in probabilistic constrained stochastic programming models. Ann. Oper. Res., 85:21–38, 1999.

[34] B. K. Rangarajan. Polynomial convergence of infeasible-interior point methods over symmetric cones. SIAM J. of Optimization, 16(4):1211–1229, 2006.

[35] S. H. Schmieta and F. Alizadeh. Associative and Jordan algebras, and polynomial time interior point algorithms for symmetric cones. Mathematics of Operations Re- search, 26(3):543–564, 2001.

[36] S. H. Schmieta and F. Alizadeh. Extension of primal-dual interior point methods to symmetric cones. Math. Program., Ser. A 96:409–438, 2003.

[37] P. Sun and R. M. Freund. Computation of minimum-cost covering ellipsoids. Opera- tions Research, 52(5):690–706, 2004.

[38] M. J. Todd. Semidefinite optimization. ACTA Numerica, 10:515–560, 2001.

[39] J. A. Tompkins, J. A. White and Y. A. Bozer, J. M. Tanchoco Facilities Planning, 3rd edn. Wiley, Chichester, 2003.

[40] P. M. Vaidya. A new algorithm for minimizing convex functions over a convex set. Math. Program., Ser. A, 73:291–341, 1996.

[41] L. Vandenberghe and S. Boyd. Semidefinite programming. SIAM Rev., 38:49–95, 1996.

[42] R. J. Vanderbei and H. Yurittan. Using LOQO to solve second-order cone program- ming problems. Report SOR 98-09, Princeton University, Princeton, USA, 1998.

155 [43] D. S. Watkins. Fundamentals of Matrix Computations. 3rd edn. Wiley, New York, 2010.

[44] R. J-B. Wets. Stochastic programming: Solution techniques and approximation schemes. Mathematical Programming: The State of the Art, 1982, A. Bachem, M. Groeschel, and B. Korte, eds., Springer-Verlag, Berlin, 566–603, 1982.

[45] M. M. Wiecek. Advances in cone-based preference modeling for decision making with multiple criteria. Decision Making in Manufacturing and Services, 1(1-2):153–173, 2007.

[46] F. Zhang. Quaternions and matrices of quaternions. Linear Algebra Appl., 251(2):21– 57, 1997.

[47] Y. Zhang. On extending primal-dual interior-point algorithms from linear programing to semidefinite programming. SIAM J. Optim., 8:356–386, 1998.

[48] G. Zhao. A log-barrier method with Benders decomposition for solving two-stage stochastic linear programs. Math. Program., Ser. A, 90:507–536, 2001.

[49] Y. Zhu and K. A. Ariyawansa. A preliminary set of applications leading to stochastic semidefinite programs and chance-constrained semidefinite programs. Applied Math- ematical Modelling, 35(5):2425–2442, 2011.

156