CONSTRAINT QUALIFICATIONS , LAGRANGIAN & SADDLE POINT OPTIMALITY CONDITIONS

A Dissertation Submitted For The Award of the Degree of Master of Philosophy in

Neelam Patel

School of Mathematics Devi Ahilya Vishwavidyalaya, (((NACC(NACC Accredited Grade “A”“A”)))) Indore (M.P.) 20122012----20132013 Contents

Page No.

Introduction 1

Chapter-1 2-7 Preliminaries

Chapter-2 8-21 Constraint Qualifications

Chapter-3 22-52 Lagrangian Duality & Saddle Point Optimality Conditions

References 53

Introduction

The dissertation is a study of Constraint Qualifications, Lagrangian Duality and saddle point Optimality Conditions. In fact, it is a reading of chapters 5 and 6 of [1]. First chapter is about preliminaries. We collect results which are useful in subsequent chapters, like Fritz-John necessary and sufficient conditions for optimality and Karush-Kuhn-Tucker necessary and sufficient conditions for optimality. In second chapter we define the cone of tangents and show that F 0 T = is a necessary condition for local optimality is the cone of tangents. The constraint ∩ ∅ qualification which are defined are Abidie ′s, Slater ′s, Cottle ′s, Zangvill ′s, Kuhn- Tucker ′s and linear independence constraint qualification. We shall prove LICQ ⇒ CQ ⇒ ZCQ KT CQ ⇒ AQ ⇒ SQ ⇑ We derive KKT conditions under various constraint qualifications. Further, We study of various constraint qualifications and their interrelationships. In third chapter, we define the Lagrangian dual problem and give its geometric interpretation. We prove the weak and theorems. We also develop the saddle point optimality conditions and its relationship with KKT conditions. Further, some important properties of the dual , such as concavity, differentiability, and subdifferentiability have been discussed. Special cases of the Lagrangian duality for Linear and quadratic programs are also discussed.

Chapter 1 Preliminaries

We collect definitions and results which will be useful. Definition 1.1(): Let f : S  R, where S is a nonempty in Rn. The function f is said to be convex on S if

f( x1+(1- )x 2) f(x1) + (1- )f(x 2) for each x 1, x 2 S and for each (0, 1).≤ Definition 1.2(Pseudoconvex ∈ function): ∈ Let S be a nonempty open set in Rn, and let f : S  R be differentiable on S. The function f is said to be pseudoconvex t if for each x 1, x 2 S with f(x 1) (x 2 - x1) 0 we have f(x 2) f(x 1). Definition 1.3(Strictly Pseudoconvex ∈ ∇ function):≥ Let S be a nonempty≥ open set in Rn, and let f : S  R be differentiable on S. The function f is said to be strictly pseudoconvex t if x 1 x2, f(x 1) (x 2 - x1) 0 we have f(x 2) > f(x 1). Definition 1.4(Quasiconvex≠ ∇ function): Let≥ f : S  R, where S is a nonempty convex n set in R . The function f is said to be quasiconvex if, for each x 1 and x 2 S,

f( x1+(1- )x 2) max {f(x 1), f(x 2)} for each (0, 1).∈ Notation 1.5: ≤ ∈ t F0 = { d/ f(x 0) d 0} The cone of feasible directions: ∇ < D = {d/d 0, x+ d, for all (0, ) for some } Theorem 1.6: Consider the≠ problem to minimize ∈ f(x) subject to x > 0 S, where f : Rn  n R and S is a nonempty set in R . Suppose f is a differentiable at∈ x 0, x 0 S. If x 0 is local minimum then F 0 D . Conversely, suppose F 0 D , f is pseudoconvex∈ at x 0 and there exists an∩ -neigborhood= ∅ N (x 0), > 0 such∩ that= d ∅ = (x – x0) D for any x S N (x 0). Then, x 0 is a local minimum of f. ∈ Lemma∈ 1.7:∩ Consider the feasible region S = {x X : g i(x) 0 for i = 1,…,m}, n n where X is a nonempty open set in R , and where g ∈i : R  R for≤ i = 1,…,m. Given a feasible point x 0 S, let I = {i : g i(x 0) = 0} be the index set for the binding or active ∈ constraints, and assume that g i for i I are differentiable at x 0 and that the g i′s for i I are continuous at x 0. Define the sets∈ ∉ t G0 = {d : gi(x 0) d 0, for each i I} t G′ = { d 0∇ : gi(x 0)

[Cones of interior directions at≠ x 0] ∇ ≤ ∈ Then, we have

G0 D G0′

Theorem 1.8: Consider the Problem⊆ P to⊆ minimize f(x) subject to x X and g i(x) 0 n n n for i = 1,…,m, where X is a nonempty open set in R , f : R  R, and∈ g i : R  R,≤ for i = 1,…,m. Let x 0 be a feasible point, and denote I = {i : g i(x 0) = 0}. Furthermore, suppose f and g i for i I are differentiable at x 0 and g i for i I are continuous at x 0. If x0 is a local optimal solution, ∈ then F 0 G0 = . Conversely,∉ if F 0 G0 = , and if f is pseudoconvex at x 0 and g i for i ∩ I are∅ strictly pseudoconvex∩ over∅ some - neigborhood of x 0, then x 0 is a local minimum.∈ Theorem 1.9(The Fritz John Necessary Conditions): Let X be a nonempty open set n n n in R and let f : R R, and g i : R R, for i = 1,…,m. Consider the Problem P to minimize f(x) subject to x X and g i(x) 0 for i = 1,…,m. Let x 0 be a feasible solution, and denote I = {i ∈: g i(x 0) = 0}. Furthermore,≤ suppose f and g i for i I are differentiable at x 0 and g i for i I are continuous at x 0. If x 0 locally solves Problem ∈ P, then there exist scalars u 0 and u ∉i for i I such that

u0 f(x 0) ∈+ i gi(x 0) = 0 ∈ ∇ ∇ u0, u i 0 for i I

(u 0, uI ) ≥ (0, 0) ∈ where uI is the vector whose component are u i for i ≠I. Furthermore, if g i for i I are also differentiable at x 0, then the foregoing conditions∈ can be written in the following∉ equivalent form:

u0 f(x 0) + i gi(x 0) = 0 ∇ ∇uigi(x 0) = 0 for i = 1,…,m

u0, u i 0 for i 1,…,m

(u 0, u) ≥ (0, 0) = where u is the vector whose components are u i for i = 1,…,m.≠ Theorem 1.10(Fritz John Sufficient Conditions): Let X be a nonempty open set in n n n R and let f : R  R, and g i : R  R, for i = 1,…,m. Consider the Problem P to minimize f(x) subject to x X and g i(x) 0 for i = 1,…,m. Let x0 be a FJ solution and denote I = {i : g i(x 0) = 0}.∈ Define S as ≤the relaxed feasible region for Problem P in which the nonbinding constraints are dropped.

a. If there exists an -neigborhood N (x 0), > 0 such that f is pseudoconvex over N (x 0) S and gi, i I are strictly pseudoconvex over N (x 0) S, then x 0 is a local minimum∩ for Problem ∈ P. ∩

b. If f is pseudoconvex at x 0 and if g i, i I are both strictly pseudoconvex and

quasiconvex at x 0, then x 0 is a global ∈ optimal solution for Problem P. In particular, if these generalized convexity assumptions hold true only by

restricting the domain of f to N (x 0) for some > 0, then x 0 is a local minimum for Problem P.

Theorem 1.11(Karush-Kuhn-Tucker Necessary Conditions): Let X be a nonempty n n n open set in R and let f : R  R, and g i : R  R, for i = 1,…,m. Consider the

Problem P to minimize f(x) subject to x X and g i(x) 0 for i = 1,…,m. Let x 0 be a feasible solution, and denote I = {i : g∈i(x 0) = 0}. Suppose≤ f and g i for i I are differentiable at x 0 and gi for i I are continuous at x 0. Furthermore, suppose ∈ gi(x 0) for i I are linearly independent.∉ If x 0 locally solves Problem P, then there∇ exist scalars∈ u i for i I such that

∈ f(x 0) + i gi(x 0) = 0 ∈ ∇ ∇ ui 0 for i I

In addition to the above assumption, if g i for each≥ i I is∈ also differentiable at x 0, then the foregoing conditions can be written in the following∉ equivalent form:

f(x 0) + i gi(x 0) = 0 ∇ ∇uigi(x 0) = 0 for i = 1,…,m

ui 0 for i 1,…,m Theorem 1.12(Karush-Kuhn-Tucker Sufficient Conditio≥ ns): Let= X be a nonempty n n n open set in R and let f : R  R, and g i : R  R, for i = 1,…,m. Consider the

Problem P to minimize f(x) subject to x X and g i(x) 0 for i = 1,…,m. Let x 0 be a ∈ ≤ KKT solution, and denote I = {i : g i(x0) = 0}. Define S as the relaxed feasible region for Problem P in which the constraints that are not binding at x 0 are dropped. Then,

a. If there exists an -neigborhood N (x 0), > 0 such that f is pseudoconvex over N (x 0) S and g i, i I are differentiable at x 0 and are quasiconvex over N (x 0) ∩ S, then x 0 is local ∈ minimum for Problem P. b. If f is pseudoconvex∩ at x 0, and if g i, i I are differentiable and quasiconvex at

x0, then x 0 is a global optimal solution∈ to Problem P. In particular, if this

assumption holds true with the domain of the feasible restriction to N (x 0), for some > 0, then x 0 is a local minimum for P. Theorem 1.13(Farkas´ Lemma): Let A be an m n matrix and c be an n vector. Then, exactly one of the following two system has a× solution: System 1 A x 0 and ct x > 0 for some x Rn System 2 At y =≤ c and y 0 for some y ∈ Rn. Theorem 1.14(Gordan ′s Theorem):≥ Let A be ∈an m n matrix. Then, exactly one of the following systems has solutions: × System 1 A x < 0 for some x Rn System 2 At p = 0 and p 0 for some nonzero∈ p Rm. Theorem 1.15(Closest point≥ theorem): Let S be a nonempty∈ closed convex set in Rn and y S. Then, there exists a nonzero vector p and a scalar such that pty > and ptx ∉ for each x S. Corollary≤ 1.16(Existence∈ of supporting hyperplane): Let S be, a nonempty convex n t set in R and x 0 int S. Then there is a nonzero vector p such that p (x – x0) 0 for each x clS. ∉ ≤ n n Lemma∈ 1.17: Let f : R  R be a convex function. Consider any point x 0 R and a n nonzero direction d R . Then, the directional derivative f ′(x 0, d), of f at∈ x 0 in the direction d, exists. ∈ Theorem 1.18: Let S be a nonempty convex set in Rn, and let f : S  R be convex.

Then, for x 0 int S, There exists a vector such that the hyperplane t ∈ H = {(x, y) : y = f(x 0) + (x-x0)}

Supports epi f at (x 0, f(x 0)). In particular, t f(x) f(x 0) + (x-x0) for each x S ≥ ∈ i.e., is a sub-gradient of f at x 0. Theorem 1.19: Let S be a nonempty convex set in Rn, and let f : S  R be convex on

S. Consider the problem to minimize f(x) subject to x S. Suppose that x 0 S is a local optimal solution to the problem. ∈ ∈

1. Then, x 0 is a global solution.

2. If either x 0 is strict local minimum or if f is strictly convex, then x 0 is the unique global optimal solution, and it is also a strong local minimum.

Theorem 1.20: Let f : Rn  R be a convex function, and let S be a nonempty compact polyhedral set in Rn. Consider the problem to maximize f(x) subject to x S.

An optimal solution x 0 to the problem then exists, where x 0 is an extreme poin ∈

Chapter 2 Constraint Qualifications

Consider a problem P : Minimize f(x) Subject to x ∈ X

gi(x) ≤ 0 i = 1,2,…,m Usually, first, Fritz John necessary conditions (at local optimality) are derived. Then under certain constraint qualifications it is asserted that the multiplier associated with the objective function is positive at a local minimum. These are called Karush-Kuhn-Tucker (KKT) necessary conditions.

In view of, theorem 1.8, local optimality implies that F 0 ⋂ G0 = ∅, which implies the Fritz John conditions. Under the linear independence constraint qualification or more generally G 0 ≠ ∅, we deduced that the Fritz John conditions can only be satisfied if the Lagrangian multiplier associated with the objective function is positive. This led to KKT conditions.

Local optimality ⇒ F0 ⋂ D = ∅ ⇒ F J conditions KKT conditions Theorem 1.6 Theorem 1.8 Constraint⇒ qualification Below, first we show, a necessary condition for local optimality is that F0 ⋂ T = ∅, where is the cone of tangents. Using the constraint qualification T = G ′, we show get

F0 ⋂ G′ = ∅. Further using Farkas ′ lemma(1.13), we get the KKT conditions.

Local optimality ⇒ F0 ⋂ T =∅ ⇒ F0 ⋂ G′ = ∅ KKT conditions Theorem 2.5 Theorem 2.7 Farkas ⇒′ Lemma

Definition 2.1(The cone of tangents of S at 0): n Let S be a non- in R , and let x 0 clS. The cone of tangents of S at x0, denoted by T, is the set of all directions d such that∈

d = ∞ xk x0) → Where k 0, xk S for each k, andlim x k x0.( − Note 2.2: >It is clear ∈ that d belongs to the cone of tangents if there is a feasible sequence {x k} converging to x 0 such that the directions of the cords x k –x0 converge to d. Remark 2.3: Alternative equivalent descriptions: The cone of tangents T can be equivalently characterized in either of the following ways: + a) T = { d : there exists a sequence{ k}0 and a function : R  R, where

( )0 as 0, such that x = x 0 + kd + k ( k) S for each k}

b) T = { d : d = ,where 0, {x k }∈ x0 and where x k S and lim → > ∈ xk x0, for each k} ≠ Proof: We have,

( k) = d - k(x-x0)0 as k ∞

d = ∞ xk x0), where k 0, x k S and x k x0 ⟺ lim → ( − > ∈ − Next, write k = in definition (b). Proposition 2.4: T is closed. Proof: T is the collection of limits. Here we develop the KKT conditions directly without first deriving the Fritz John conditions. This is done under various constraint qualifications. n Theorem 2.5: Let S be a non-empty set in R and let x 0 ∈ S. Furthermore, suppose n that f: R R is differentiable at x 0. If x 0 locally solves the problem to minimize f(x) t subject to x S. Then F 0 T = , where F 0= { d : f(x 0) d 0} and T is the cone of tangents of S∈ at x 0. ∩ ∅ ∇ < Proof: Let d i.e.,

∈ T d = ∞ xk x0) → where 0, x k S for each K and lim x  x0. ( − By differentiability > ∈ of f at x 0, we get t f(x k) – f(x 0) = f(x 0) ) (x k-x0) + xk-x0 (x 0, xk- x0) (2.1) where (x 0, x- x0)0 as x k  x0∇. ∥ ∥

By local optimality of x 0, we have

f(x k) f(x 0) for k large By (2.1), we get ≥ t f(x 0) (x k-x0) + xk-x0 (x 0, xk-x0) 0 Multiplying by ∇ 0 and taking the∥ limit∥ as k ∞, the≥ above inequality implies that > t f(x 0) d 0

⇒d∇ F0 ≥

⇒ F 0∉ T =

Note 2.6: The condition F 0 T = is not∩ sufficient for local optimality of x 0. This condition will hold true whenever⋂ F ∅0 = , which is not sufficient for local optimality.

However, if there exists as -neighborhood∅ N (x 0) about x 0 such that N (x 0) S is convex and f is pseudo-convex over N (x 0) S than F 0 T = is sufficient ⋂to claim that x 0 is a local minimum. ⋂ ∩ Abadie Constraint Qualification: T = G ′ Theorem 2.7(KKT Necessary Conditions): Let X be a non-empty set in Rn and let f n n : R  R and g i : R  R, i = 1,…,m. Consider the problem Minimize f(x)

Subject to g i(x) ≤ 0 i = 1,…,m x X

Let x 0 be a feasible solution and let I = {i∈ : gi(x 0) = 0}. Suppose f and g i , i ∈ Ι are differentiable at x 0. Furthermore, suppose the constraint qualification T = G′ is true. If x0 is a local optimal solution, then there exist nonnegative scalars u i for i ∈ Ι such that

f(x 0) + i gi(x 0) = 0 ∈ Proof: By above theorem 2.5,∇ we have F 0 ⋂∇ T = ∅. By assumption T = G ′, so that

F0 ⋂ G′ = ∅. This means the following system has no solution t t f(x 0) d 0 , gi(x 0) d ≤ 0 for i ∈ I By Farkas ′ Lemma, the∇ following< system∇ has a non-negative solution

f(x) + i gi(x 0) = 0 Example 2.8: ∇ ∈ ∇

Minimize -x1 3 Subject to x2-(1-x1) ≤ 0

-x2 ≤ 0

The optimal point is x 0 = (1, 0). I = {1, 2} t f(x 0) = (-1, 0) t ∇g1(x 0) = (0, 1) t ∇g2(x 0) = (0, -1) ∇

Figure 2.1 Now, t G′ = {d / gi(x 0) d ≤ 0, i ∈ I} we have ∇ t g1(x 0) d ≤ 0 ∇ ⇒ (0, 1) ≤ 0 ⇒ d2 ≤ 0 and t g2(x 0) d ≤ 0 ∇ ⇒ (0, -1) ≤ 0 ⇒ d2 ≥ 0 Hence

d2 = 0 and d 1 R Therefore, ∈ G′ = {(d, 0) / d R} where T is the shaded region. Thus, the Abade ′ s∈ Constraint Qualification T = G ′does not satisfy. Remark 2.9: The Abadie constraint qualification T = G ′ can be equivalently stated asT G′; since T G′ is always true. We have ⊇ ⊆ S= {x X : g i(x) 0, i = 1,…,m} Let d T. Then ∈ ≤

∈ d = ∞ xk x0) → Where k 0 , xk S for each k andlim x kx0. Since( − x k S

> ∈ gi(x k) ≤ 0 ∈ For i I, we write t 2 2 ∈ gi(x 0 + k(x k-x0)) = g i(x 0) + k gi(x 0) (x k - x0) + k xk-x0 (x 0, k( xk-x0)) t 2 2 = k gi(x 0) (x∇k-x0) + k xk-x0 ∥(x 0, k∥(x k-x0))

Choose k , such that x 0 + k(x k-x∇0) S, so that ∥ ∥

gi(x k + ∈k(x k-x0)) ≤ 0 and hence as k ∞ t gi(x 0) d ≤ 0 ⇒∇ d G′ ∈ Linearly Constrained Problem: Below, we shall show that if the constraints are linear, then Abadie constraint is automatically true. Further, it implies that KKT conditions are always necessary for problem with linear constraints whether the objective function is linear or nonlinear. Lemma 2.10: Let A be an m n matrix. Let b be an m vector, and let S = {x : Ax ≤ t t t t b}. Suppose x 0 S is such that× A1x = b1 and A2x b2, where A = ( A1 , A2 ) and b = t t (b1 , b2 ). Then T∈ = G ′. < n Proof: If A1 = Then by definition of G ′, G ′ = R . Furthermore, as x 0 int S, T = Rn. Thus G ′ = T.∅ ∈

Now, suppose that A1 . Let d T i.e,

≠ d ∅ = ∈ ∞ xk x0) , where x k S, k 0 for each k Then, lim → ( − ∈ >

A1(x k-x0) ≤ b1 - b1 = 0 (2.2)

Multiplying (2.2) by k 0 and taking the limit as k ∞

> A1d ≤ 0 ⇒ d G′ ⇒ T ∈ G′ ⊆ Now, let d G′ i.e, A1d ≤ 0. To show that d T. Since A2x0 b2, there is a 0 such that ∈ ∈ < >

A2(x 0 kd) b2 for all (0, )

Further, since A1x0 = b1 and A1+d ≤ 0 then< ∈

A1(x 0+ d) ≤ b1 for all 0

⇒ x0+ d S for each > (0, ) ⇒ d T ∈ [ Use definition ∈ of T ] ⇒ G ′∈ T Therefore, T = G ′. ⊆

Other Constraint Qualifications The KKT conditions can be developed under various constraint qualifications. In this section, we present some important constraint qualifications.

We know that local optimality implies that F 0 T = and that the KKT conditions follow under the constraint qualification T = G ∩′. ∅

If we define a cone C T, then F 0 T = also implies that F 0 C = . Therefore, any constraint of the ⊆ form C = G ⋂′ will lead∅ to the KKT conditions. ⋂ Since∅ C T G′, the constraint qualification C = G ′ implies T = G ′. Hence, the constraint qualification ⊆ ⊆ C = G ′ is more restrictive than Abadie’s qualification. Theorem 2.5 if C T constraint qualification C = G′ ⊆ local optimality F0 T = F0 C = F0 G′ =

⇒ ∩ ∅ ⇒ ∩ ∅ ⇒ ∩Farkas ′ theorem∅ KKT Conditions⇓ We consider below several such cones whose closures are contained in T.

Note that

S = {x X : g i(x) ≤ 0, i=1,...,m} , x0 is a feasible point, ∈

I = {i : g i(x 0) = 0}

The cone of feasible Directions of S at x 0: D = { d : d 0, x + d S, for all (0, ) for some 0}

The Cone of Attainable Directions≠ of S∈ at x 0: ∈ > It is a cone of directions d such that there exist a 0 and an : R  Rn such that ( ) S for ), (0) = x 0 and >

∈ ∈ (0, = d  (( It is denoted by A. Inlim other words, d belongs to the cone of attainable direction if there is a feasible arc starting from x 0 that is tangential to d.

The Cone of Interior Directions of S at x 0: t G0 = { d : gi(x 0) d 0, i I}

Note 2.11: If X is open and each∇ g i, for< i I∈ is continuous at x 0 then d G0 implies 0 that x 0+ d X for 0 and sufficiently small. ∉ ∈ Below, we ∈show that all > above cones and their closures are contained in T. n n n Lemma 2.12: Let X be a non-empty set in R , and let f : R  R and g i : R  R for i = 1,…,m. Consider the problem to minimize f(x) subject to g i(x) ≤ 0, i=1,…,m and x X. Let x 0 be a feasible point and let I = {i : g i(x 0) = 0}. Suppose that each g i for t i ∈I is differentiable at x 0 and let G ′ = {d : gi(x 0) d ≤ 0, i I}. Then ∈ clD clA∇ T G′ ∈ where D, A and T are, respectively,⊆ the cone⊆ ⊆ of feasible directions, the cone of attainable directions, and the cone of tangents of feasible region at x 0. Furthermore if

X is open and each g i for i I is continuous at x 0 then G 0 D, so that

∉ clG 0 clD clA T ⊆ G′ where G 0 is the cone of interior directions⊆ of⊆ the feasible ⊆ ⊆ region at x 0. Proof: If we consider (x) = x, an identity mapping, D A. Next, if we consider, since X, any sequence x k = k) which converges to⊆ x 0 then ( ∈ (

d = ∞ = ∞ (( lim lim → ( − =

Therefore A T. By remark 2.9. T G′. Hence ⊆ D ⊆ A T G′ As T and G ′ are closed ⊆ ⊆ ⊆ clD clA T G′

Now note that, by lemma 1.7, G0 D. ⊆ Hence, ⊆ ⊆

clG0⊆ clD clA T G′ ⊆ ⊆ ⊆ ⊆ Remark 2.13: To see how each of five containments can be strict we consider examples. Example 2.14: Consider figure 2.2.

Figure 2.2

As there is no interior in the immediate vicinity of , G 0 = ∅ = clG0 ; whereas clearly

D = clD = G ′; a feasible direction is along the edge incident at x 0.

Note that any d G0 is a direction leading to interior feasible solutions, it is not true that any feasible ∈ direction that leads to interior points belongs to G 0. Example 2.15: [Figure 2.3]

Minimize -x1 3 Subject to x 2 – (1- x1) ≤ 0 3 -x2 – (1- x1) ≤ 0.

Figure 2.3 t t Obviously G 0 = ∅ at x 0 = (1, 0) where as d = (-1, 0) gives interior feasible solutions. Following example shows cl D cl (A). Example 2.16: Consider the region⊊ defined by 2 x1 – x2 0 2 - x1 + x 2 ≤ 0 2 i.e., x1 = x 2 ≤ 2 t The set of feasible points lies on the parabola x1 = x 2 . At x 0 = (0, 0) , D = ∅ = cl D while A = { d / d = (0, 1) t or d = (0, -1) t, 0} = G ′ Now, we give an example to show that cl A T. ≥ Example 2.17: Suppose that the feasible region≠

S = , 0 / = 1,2 … Thus A = ∅ = cl A , since there are no feasible arcs. By definition, T = { d / d = (1,0) , 0} and T = G ′. ≥ Next, to see that T G′. ⊊

Example 2.18: [Figure 2.1]

Minimize -x1 3 Subject to x 2 - (1- x1) ≤ 0

-x2 ≤ 0 Here, T = {d / d = (-1, 0) t, ≥ 0} while G′= { d / d = (-1, 0) t or d = (1, 0) t}. Below, we show that some constraint qualifications that validate the KKT conditions and discuss their interrelationships. Slater ′s Constraint Qualification (SCQ)

The set X is open, each g i, i I is pseudo-convex at x 0, each g i , i I is continuous at x 0, and there is an x X ∈such that g i(x) 0 for all i I. ∉ Linear Independence Constraint ∈ Qualification (LICQ)< ∈

The set X is open, each g i, i I is continuous at x 0 and gi(x 0), i I are linearly independent. ∉ ∇ ∈ Cottle ′s Constraint Qualification (CCQ)

The set X is open each g i, i I is continuous at x 0, and cl G 0 = G ′. Zangwill ′s Constraint Qualification (ZCQ)∉ cl D = G ′ Kuhn – Tucker ′s Constraint Qualification (KTCQ) cl A = G ′

Validity of the Constraint Qualification and their Interrelationships In theorem 2.7, we showed that the KKT necessary optimality conditions are necessarily true under Abadie ′s constraint qualification T = G ′. We show below that all the constraint qualifications stated above imply Abadie ′s constraint qualification and, hence each validates the KKT necessary conditions. From lemma 2.12, it is clear that Cottle ′s constraint qualification implies that Zangwill, which implies that of Kuhn and Tucker which in turn implies Abadie ′s qualification. Cottle ′s CQ ⇒ Zangwill ′s CQ ⇒ KT CQ ⇒ Abadie CQ

To show that Slater ′s CQ ⇒⇒⇒ Cottle ′s CQ LICQ Suppose Slater ′s constraint qualification⇐ holds. Then there is an x X such that g i(x) 0, i I. ∈

< ∈ Since g i(x) 0 and g i(x 0) = 0 i.e., g i(x) gi(x 0) = 0

By the pseudo-convexity of g i at x<0 < t gi(x 0) (x-x0) 0

⇒ ∇d = (x-x0) G<0 i.e., G 0 ∅.

Therefore, cl G 0 = G ′ and hence,∈ ≠ Slater ′s CQ ⇒ Cottle ′s CQ Now suppose that the linear independence constraint qualification is satisfied. i.e., = 0 has no nonzero solution.∈ ∇ ( t By theorem 1.14, it follows that there exists a vector d such that gi(x 0) d 0, i I

Thus G 0 ∅, and, hence Cottle ′s qualification holds true. Therefore,∇ < ∈ ≠ LICQ ⇒ Cottle ′s CQ The counter examples given also show that all these implication are one-way. Example 2.19: [Example 2.8, Figure 2.1]

minimize -x1 3 Subject to x 2 - (1 - x1) ≤ 0

-x2 ≤ 0 t t t f(x 0) = (-1, 0) g1(x 0) = (0, 1) g2(x 0) = (0, -1)

In Figure∇ 2.1, observe that FJ conditions∇ are true with u 0 = 0,∇ u 1 = u 2 = . Hence, x 0 is not a KKT point. Thus, no constraint qualification can possibly hold true.

Note 2.20: Cottle ′s constraint qualification is equivalent to requiring that G 0 ∅. Let A be an m n matrix and consider the cones ≠

× G0 = { d / Ad 0} G′ = { d / Ad < 0}

It is easy to see that G 0 is an open convex cone and≤ G ′ is a closed convex cone.

Further G 0 = int G ′. Next,

clG0 = G ′ G0 ∅ Remark 2.21: We know Slater ′s CQ and⟺ LICQ ≠ both imply Cottle ′s CQ. Hence whenever these constraints qualifications hold at a local minimum x 0, then x 0 is a FJ point with the Lagrangian multiplier u 0 0. Other way we might have Zangwill> ′s or KT ′s or Abadie ′s CQs holding at a local minimum x 0 with possibly be zero in some solution to the FJ conditions.

Chapter 3 Lagrangian Duality &&& Saddle Point Optimality CConditionsonditions

Given a non-linear programming problem there is another non-linear programming problem closely associated with it. The former is called the primal problem and the later is called the Lagrangian dual problem. Under certain convexity assumptions and suitable constraint qualifications, the primal and dual problems have equal optimal objective values. Hence it is possible to solve the primal problem indirectly by solving the dual problem. The Lagrangian Dual Problem Consider the following non-linear programming Problem P, called primal problem Primal Problem P: Minimize f(x Subject to gi(x ≤ 0 i = 1,…,m hi(x = 0 i = 1,…,l x ∈ X Lagrangian Dual Problem D: Maximum (uuuu, vvvv Subject to uuu ≥ 000 where (uuuu, vvvv = inf ( + ( + ℎ(: ∈ Remark 3.13.1:: The Lagrangian dual function may assume the value of -∞ for some vector (uuuu, vvv. v The optimization problem that evaluates (uuuu, vvv v is called Lagrangian dual subproblem. The multiplier u i is non-negative where as v i is unrestricted. Since the dual problem consist of maximizing the infimum (glb of the function

( + ( + ℎ( it is sometimes referred to as maxmax----minmin dual problem. Note 3.2: The primal and Lagrangian dual problems can be written in the nnn nnn mmm following form using vector notation, where f : RRR R , ggg : RRR RRR is a vector th nnn th function whose i component is gi and hhh : RRR RRR is a vector function whose i component is h i. Primal Problem P: Minimize f(x Subject to ggg(xg ≤ 000 hhh(xh = 000 x ∈ X Lagrangian Dual Problem D : Maximize (uuuu, vvvv Subject to uuu ≥ 000 ttt t where (uuuu, vvvv = inf {f(x + uuu ggg(xg + vvv hhh(xh : x ∈ X} Remark 3.33.33.3:3.3 : Given a non-linear programming problem, several Lagrangian dual problems can be devised, depending on which constraints are handled as ggg(xg ≤ 000 and hhh(xh = 000 and which constraint are treated by the set X. This, choice can affect both the optimal value of D (as in non-convex situations and the effort expended in evaluating and updating the dual function during the course of solving the dual problem. Hence, an appropriate selection of the set X must be made, depending on the structure of the problem and the purpose for solving D.

Geometric Interpretation of the Dual Problem For the simplicity, we shall consider only one inequality constraint and assume that no equality constraints exist. Then, the primal problem is Minimize f(x Subject to g(x ≤ 0 x ∈ X In the (y, z plane, the set G = {(y, z : y = g(x, z = f(x for some x ∈ X}is the image of X under the (g, f map. The primal problem is to find a point in G with y ≤ 0 that has minimum ordinate. Obviously, this points is (, in Figure 3.1.

Figure 3.1 --- Geometric interpretation of Lagrangian duality Now suppose that u ≥ 0 is given. To determine (u, we need to minimize f(x + ug(x over all x ∈ X. Put y = g(x and z = f(x, x ∈ X. This problem is equivalent to minimize z + uy over points in G. Note that z + uy = is an equation of a straight line with slop –u and intercept on the z-axis. To minimize z + uy over G, we need to move the line z + uy = parallel to itself as far down (along its negative gradient as possible until it supports G from below, i.e, the set G is above the line and touches it. Then, the intercept on the z axis gives (u. The dual problem is therefore equivalent to finding the slop of the supporting hyperplane such that its intercept on the z axis is maximal. In figure such a hyperplane has – and supports the set G at the point (, . Thus, the optimal dual solution is , and the optimal dual objective value is . Furthermore, the optimal primal and dual objectives are equal in this case. There is another related interesting interpretation that provides an important conceptual tool. For the problem under consideration, define the function (y = min {f(x : g(x ≤ y, x ∈ X} The function is called a since it is the optimal value function of a problem obtained from the original problem by perturbing the right-hand side of the inequality constraint g(x ≤ 0 to y from the value of zero. Note that (y is a non-increasing function of y since, as y increases, the feasible region of the perturbed problem enlarges (or stays the same. In figure 3.1, observe that corresponds here to the lower envelope of G between points A and B because this envelope is itself monotone-decreasing. Moreover, remains constant at the value at point B for values of y higher than that at B, and becomes ∞ for points to the left of A because of infeasibility.′ In particular, if is differentiable at the origin, we observe that (0 = - . Hence, the marginal rate of change in objective function value with an increase in the right-hand side of the constraint from its present value of zero is given by - , the negative of the Lagrangian multiplier value at optimality. If is convex but is not differentiable at the origin, then – is evidently a subgradient of at y = 0. In either case, we know that (y ≥ (0 – y for all y ∈ RRR We shall see later, can be non-differentiable and/or non-convex, but the condition (y ≥ (0 – y holds true for all y ∈ RRR,R if and only if is a KKT Lagrangian multiplier corresponding to an optimal solution x 0 such that it solves the dual problem with equal primal and dual objective values. This happens to be the case in figure 3.1. Example 3.43.43.4:3.4 : Consider the following primal Problem 2 2 Minimize x 1 + x 2

Subject to -x1 – x2 + 4 ≤ 0 x1, x2 ≥0

Observe that the optimal solution occurs at the point (x 1, x 2 = (2, 2 with objective value 8. Let g(x = -x1 - x2 + 4, X = {(x 1, x2 : x1, x2 ≥ 0} The dual function is given by 2 2 (u = inf {x 1 + x2 + u(-x1 - x2 + 4 : x 1, x2 ≥ 0} 2 2 = inf {x1 - ux 1 : x 1 ≥ 0} + inf {x 2 - ux 2 : x 2 ≥ 0} + 4u 1 2 1 It is easy to see that the above infima are achieved at x = x = if u ≥ 0 and x = x2 = 0 if u < 0. Hence, − + 4 ≥ 0 (u = 4 < 0

Figure 3.2 – Graph of primal and dual function Note that is a , and its maximum over u ≥ 0 occurs at = 4. Note also that the optimal primal and dual objective values are both equal to 8. Now let us consider the problem in the (y, z plane, where y = g(x and z = f(x. To find G, the image of X = {(x 1, x2 : x 1 ≥ 0, x2 ≥ 0}, under the (g, f map. This is done by finding explicit expressions for the lower and upper envelops of G, denoted respectively and . Given y, note that ( and ( are the optimal objective values of following problems P 1 and P 2 respectively Problem P 1 Problem P 2 2 2 2 2 Minimize x 1 + x 2 Maximize x 1 + x 2 Subject to -x1 – x2 + 4 = 0 Subject to -x1 – x2 + 4 = 0 x1 , x 2 ≥ 0 x1 , x 2 ≥ 0 111 For P : It is easy to see that optimal (minimum value exists at , with ( objective function value ( = for y ≤ 4 For P 222: Optimal points (0, 4 − , (4 − , 0 with ( = (4 − for y ≤ 4. Note that x ∈ X implies that x 1, x 2 ≥ 0 so that -x1 - x2 + 4 ≤ 4. Thus, every point x ∈ X corresponds y ≤ 4. Note that the optimal dual solution is = 4, which is the negative of the slop of the supporting hyperplane shown in figure 3.2. The optimal dual objective value is (0 = 8 and is equal to the optimal primal objective value. Next, the perturbation function (y, y ∈ R corresponds to the lower envelope′ ( for y ≤ 4 and (y remains constant at the value of 0 for y ≥ 4. The slop (0 equals -4, the negative of the optimal Lagrangian multiplier value. Moreover, we have (y ≥ (0 –4y for all y ∈ RRR

Figure 3.33.33.3-3.3 --- Geometric illustration of the example We shall show that this is a necessary and sufficient condition for the primal and dual objective values to match at optimality.

Duality Theorems and Saddle PPointoint Optimality ConditioConditionnnnssss Weak Duality Theorem 3.53.5:::: Let x be a feasible solution to Problem P: Minimize f(x Subject to ggg(xg ≤ 000 hhh(xh = 000 x ∈ X Let (uuuu, vvvv be a feasible solution to Problem D: Maximize (uuuu, vvvv Subject to uuu ≥ 000 ttt ttt where (uuuu, vvvv = inf {f(x + uuu ggg(xg + vvv hhh(xh : x ∈ X}. Then f(x ≥ (uuuu, vvv.v

ProofProof:: By the definition of , and since x ∈ X, we have ttt ttt (uuuu, vvvv = inf {f(y + uuu ggg(yg + vvv hhh(yh : y ∈ X} ttt ttt ≤ f(x + uuu ggg(xg + vvv hhh(xh ≤ f(x ≤ f(x [Since uuu ≥ 000,0 ggg(xg ≤ 000,0 and hhh(xh = 000]0 Corollary 3.63.6:: inf {f(x : x ∈ X, ggg(xg ≤ 000,0 hhh(xh = 000}0 ≥ sup { (uuuu, vvvv : uuu ≥ 000}0 Corollary 3.73.7:: If f(x 0 = (, where ≥ 000 and x0 ∈ {x ∈ X : ggg(xg ≤ 000,0 hhh(xh = 000}0 then x 0 and (, solve the primal and dual problems, respectively. PrPrProofPr oofoof:::: f(x 0 = (, ⇒ (uuuu, vvvv ≤ f(x 0 = (, ≤ f(x ⇒ x 0 is an optimal solution of P and (, is dual optimal solution of D.

Corollary 3.83.8:: If inf {f(x : x ∈ X , ggg(xg ≤ 000,0 hhh(xh = 000}0 = -∞ Then (uuuu, vvvv = -∞ for each uuu ≥ 000 Corollary 3.93.9:: If Sup {(uuuu, vvvv : uuu ≥ 000}0 = ∞ Then the primal problem has no feasible solution.

Duality GapGap:::: From corollary 3.6, to Weak duality theorem, the optimal objective value of the primal problem is greater than or equal to the optimal objective value of the dual problem. If strict inequality holds true, then a is said to exist.

Figure 3.4 --- Illustration of a duality gap. The figure illustrates the case of a duality gap for a problem with a single inequality constraint and no equality constraints. The perturbation function (y for y ∈ RRR,R is the greatest monotone non- increasing function that envelopes G from below. The optimal primal value is (0. The greatest intercept on the ordinates z-axis achieved by a hyperplane that supports G from below gives the optimal dual objective value. In particular, there does not exist a such that (y ≥ (0 - y for all y ∈ RRR.R Example 3.103.10:: Consider the following problem Minimize f(x = -2x 1 + x 2 Subject to h(x = x 1 + x 2 -3 = 0 (x 1, x2 ∈ X where X = {(0, 0, (0, 4, (4, 4, (4, 0, (1, 2, (2, 1} It is easy to see that x 0 = (2, 1 is the optimal solution to the primal solution. With objective value equal to -3. The dual objective function is given by (v = min {f(x + vh(x : x ∈ X} = min {-2x 1 + x 2 + v(x1 + x 2 -3 : (x 1, x2 ∈ X} = min {-3v, 4+v, 5v-4, v-8, 0, -3} Thus, the explicit expression for is given by

−4 + 5 ≤ −1 (v = −8+ −1≤≤2 −3 ≥ 2

Figure 3.53.53.5 --- Dual function for Example 3.10 The dual function is shown in above figure 3.5, and the optimal solution is = 2 with objective value -6. There exists a duality gap. Now, consider the graph G = {(h(x, f(x: x ∈ X}, which consist of a finite number of points. In particular, G = {(x 1 + x 2 – 3, -2x 1 + x 2 : (x 1, x2 ∈ X}

Figure 3.3.6666 - Geometric interpretation of Example 3.10 The supporting hyperplane, whose intercept on the vertical axis is maximal, is equal to -6 with the slop -2. Thus the optimal dual solution is = 2, with objective value -6. Furthermore, note that the points in the set G on the vertical axis correspond to the primal feasible points and hence the primal objective value is equal to -3. The perturbation function here is defined as (y = min {f(x : h(x = y, x ∈ X} Because of the discrete nature of X, h(x can take only a finite possible number of values. Hence noting G in figure 3.6, we obtain (-3 = 0, (0 = -3, (1 = -8, and (5 = -4 with (y = -∞ for all y ∈ R otherwise. Again, the optimal primal value is (0 = -3, and there does not exist a such that (y ≥ (0 –y. Hence, a duality gap exists. Conditions that guarantee absence of a duality gap are given by Strong duality theorem. Further, we shell related these conditions to perturbation function.   nnn nnn nnn Lemma 3.113.11:: Let X be a non-empty convex set in RRR . Let : RRR RRR and ggg : RRR mmm nnn RRR be convex, and let hhh : RRR R be affine; i.e, hhh is of the form hhh(xh = AAAxA – bbb.b If system 1 below has no solution x, then system 2 has a solution (u 0, uuu, u vvv. v The converse holds if u 0 > 0. System 1: α(x < 0 ggg(xg ≤ 0 hhh(xh === 000 for some x ∈ X ttt ttt System 2: u0(x + uuu ggg(xg + vvv hhh(xh ≥ 0 for all x ∈ X (u 0, uuuu ≥ 000 (u 0, uuu,u vvvv ≠ 000 ProofProof:: Suppose the System 1 has no solution, and consider the following set: ⋀ = {(p, qqq,q rrrr : p > (x, qqq = ggg(x,g rrr = hhh(xh for some x ∈ X} As X, and ggg are convex and hhh is affine, ⋀ is convex. Since System 1 has no solution (0, 000, 0 0000 ∉ ⋀ . By corollary 1.16, there exists a non-zero (u 0, uuu, u vvvv such that ttt ttt u0p + uuu q + vvv r ≥ 0 for each (p, qqq,q rrrr ∈ cl ⋀ (3.1 Now fix an x ∈ X. Since p and qqq can be made arbitrarily large, (3.1 holds true only if u 0 ≥ 0 and uuu ≥ 000.0 Furthermore; (p, qqq,q rrrr = [(x, ggg(x,g hhh(x]h ∈ cl ⋀ Therefore, from (3.1, we get ttt ttt u0(x + uuu ggg(xg + vvv hhh(xh ≥ 0 Since the above inequality is true for each x ∈ X, System 2 has a solution. To prove the converse, assume that System 2 has a solution (u 0, uuu,u vvvv such that u 0 > 0 and uuu ≥ 000,0 satisfying ttt ttt u0(x + uuu ggg(xg + vvv hhh(xh ≥ 0 for each x ∈ X Now let x ∈ X be such that g (x ≤ 0 and h (x = 000. 0 From the above inequality, since u ≥ 000, 0 we conclude that u 0(x ≥ 0. Since u 0 > 0, (x ≥ 0; and, hence, System 1 has no solution.

Following theorem is the Strong duality theorem which shows that under suitable convexity assumptions and under a constraint qualification, the optimal objective function values of the primal and dual problems are equal. nnn Theorem 3.123.12(Strong (Strong Duality Theorem: Let X be a nonempty convex set in RRR . nnn nnn m nnn Let f : RRR R and ggg : RRR RRR be convex, and let hhh ::: RRR R be affine; i.e., hhh is of the form hhh(xh = AAAxA ––– bbb.b Suppose that the following constraint qualification holds true. There exists an ∈ X such that ggg(g < 000 and hhh(h = 000,0 and 000 ∈ int {hhhh(X = {hhhh(x : x ∈ X}}. Then inf {f(x : x ∈ X, ggg(xg ≤ 000,0 hhh(xh = 000}0 = sup {(uuuu, vvvv :uuuu ≥ 000}0 (3.2 Furthermore, if the inf is finite, then sup {(uuuu, vvv v : uuu ≥ 000} 0 is achieved at (, ttt with ≥ 000.0 If the inf is achieved at x 0, then ggg(xg 0 = 0.

ProofProof:: Let = inf {f(x : x ∈ X, ggg(xg ≤ 000,0 hhh(xh = 000}.0 By assumption < ∞. If = - ∞, then by corollary 3.8, to weak duality theorem, sup {(uuuu, vvvv : uuu ≥ 000}0 = - ∞. Therefore (3.2 holds true. Hence, suppose < ∞. Consider the following system: f(x – < 0 ggg(xg ≤ 0 hhh(xh = 000 x ∈ X By definition of , this system has no solution. Hence, from lemma 3.11, there exists a non-zero vector (u 0, uuu,u vvvv with (u 0, uuuu ≥ 0 such that ttt ttt u0[f(x - ] + uuu ggg(xg + vvv hhh(xh ≥ 0 for all x ∈ X (3.3 To show that u 0 > 0. If u 0 = 0, by assumption, there exists an ∈ X such that ggg(g ttt < 000 and hhh(h = 000.0 Substituting in (3.3, it follows uuu ggg(g ≥ 0. Since ggg(g < 000 and uuu ttt ≥ 000,0 uuu ggg(g ≥ 0 is possible only if uuu = 000.0 But from (3.3, u 0 = 0 and u = 000,0 implies ttt that vvv hhh(xh ≥ 000 for all x ∈ X. Further, since 000 ∈ int hhh(X,h we can pick an x ∈ X such that hhh(xh = -vvvv, > 0. Therefore, ttt 2 0 ≤ vvv hhh(xh = - ⇒ vvv = 000 Thus, we have shown that u 0 = 0 implies that (u 0, uuu,u vvvv = 000,0 which is impossible. Hence u0 > 0. Dividing (3.3 by u 0 and denoting uuu/uu 0 and vvv/uv 0 by and , respectively, we get t ttt f(x + ggg(xg + hhh(xh ≥ for all x ∈ X (3.4 t ttt This shows that (, = inf {f(x + ggg(xg + hhh(xh : x ∈ X} ≥ . In view of Weak duality theorem, it is then clear that (, = . and (, solves the dual problem. Now, suppose that x 0 is an optimal solution to the primal problem, i.e, t x0 ∈ X, ggg(xg 0 ≤ 000,0 hhh(xh 0 = 000,0 and f(x 0 = . From (3.4, let x = x 0, we get ggg(xg 0 ≥ t 0. Since ≥ 000 and ggg(xg 0 ≤ 000,0 we get ggg(xg 0 = 0.

Definition 3.13 (Saddle Point CriteriaCriteria:::: Given the primal problem P, define the Lagrangian function ttt ttt ∅(x, uuu,u vvvv = f(x + uuu ggg(xg + vvv hhh(xh A solution (x0, , is called a saddle point of the Lagrangian function if x0 ∈ X, ≥ 000,0 and ∅(x 0, uuu,u vvvv ≤ ∅(x 0, , ≤ ∅(x, , for all x ∈ X, and all (uuuu, vvvv with uuu ≥ 000 (3.5 Observe that x 0 minimizes ∅ over X when (uuuu, vvvv is fixed at (, , and that (, maximizes ∅ over all (uuuu, vvvv with uuu ≥ 0 when x is fixed at x 0.

The following result characterizes a saddle point solution and shows that its existence is a necessary and sufficient condition for the absence of a duality gap. Theorem 3.14(Saddle Point Optimality and Absence of a Duality Gap: A solution (x 0, , with x 0 ∈ X and ≥ 0 is a saddle point for the Lagrangian ttt ttt function ∅(x, uuu,u vvvv = f(x + uuu ggg(xg + vvv hhh(xh if and only if a. ∅(x 0, , = minimum {∅(x, , : x ∈ X} b. ggg(xg 0 ≤ 0 , hhh(xh 0 = 000,0 and t c. ggg(xg 0 = 0

Moreover, (x 0, , is a saddle point if and only if x 0 and (, are, respectively, optimal solutions to the primal and dual problems P and D with no duality gap, i.e, with f(x 0 = (, . ProofProof:: Suppose that (x 0, , is a saddle point for the Lagrangian function ∅. By definition, condition (a is true. Again by definition 3.13,(first inequality t ttt ttt ttt f(x 0 + ggg(xg 0 + hhh(xh 0 ≥ f(x 0 + uuu ggg(xg 0 + vvv hhh(xh 0 for all (uuuu, vvv,v uuu ≥ 0 (3.6 t t ⇒ (uuuu - ggg(xg 0 + (vvvv - hhh(xh 0 ≤ 0 As uuu ≥ 0 and v is arbitrary, we must have ggg(xg 0 ≤ 000 and hhh(xh 0 = 000.0 If uuu = 000 in (3.6; then t ggg(xg 0 ≥ 000 t Next, ≥ 0 and ggg(xg 0 ≤ 000 implies ggg(xg 0 ≤ 000.0 Therefore, t ggg(xg 0 = 000 Thus, conditions (a, (b, and (c hold. Conversely, suppose that we are given (x 0, , with x 0 ∈ X and ≥ 0 such that conditions (a, (b, and (c hold. Then, by property (a ∅(x 0, , ≤ ∅(x, , for all x ∈ X Furthermore, t ttt ∅(x 0, , = f(x 0 + ggg(xg 0 + hhh(xh 0 t = f(x 0 [ ggg(xg 0 = 0, hhh(xh 0 = 0] ttt ttt ≥ f(x 0 + uuu ggg(xg 0 + vvv hhh(xh 0 [gggg(x 0 ≤ 0 and hhh(xh 0 = 0] = ∅(x 0, uuu,u vvvv for all (uuuu, vvvv with uuu ≥ 000 Thus we get, ∅(x 0, uuu,u vvvv ≤ ∅(x 0, , ≤ ∅(x, , Hence, (x 0, , is a saddle point. Now, to prove the second part of the theorem, suppose (x 0, , is a saddle point. By property (b, x 0 is feasible to problem P. Since ≥ 000, 0 we also have that (, is feasible to D. Moreover, by properties (a, (b and (c t ttt (, = ∅(x 0, , = f(x0 + ggg(xg 0 + hhh(xh 0 t = f(x 0 [ ggg(xg 0 = 0, hhh(xh 0 = 0] By corollary 3.7, to Weak duality theorem, x 0 and (, solve P and D, respectively, with no duality gap. Now, suppose that x 0 and (, are optimal solutions to problems P and D, respectively, with f(x 0 = (, . Hence we have x 0 ∈ X, ggg(xg 0 ≤ 000,0 hhh(xh 0 = 000,0 and ≥ 000,0 by primal dual feasibility. Thus, t ttt (, = min {f(x + ggg(xg + hhh(xh : x ∈ X} t ttt ≤ f(x 0 + ggg(xg 0 + hhh(xh 0 t = f(x 0 + ggg(xg 0 ≤ f(x 0 t ⇒ ggg(xg 0 = 0 [f(x 0 = (, ] and ∅(x 0, , = f(x 0 = (, = min {∅(x 0, , : x ∈ X} Hence, properties (a, (b, and (c hold. x 0 ∈ X and ≥ 000 imply that (x 0, , is a saddle point. Corollary 3.15: Suppose that X, f and ggg are convex and that hhh is affine; i.e, hhh is of the form hhh(xh = AAAxA – bbb.b Further, suppose 0 ∈ int hhh(Xh and that there exists an ∈ X with ggg(g < 000 hhh(h = 000. 0 If x 0 is an optimal solution to the primal Problem P, then there exists a vector (, with ≥ 000,0 such that ∅(x 0, , is a saddle point. ProofProof:: By Strong duality theorem, there exists an optimal solution (, , ≥ 0 to problem D such that f(x 0 = (, . Hence, by theorem 3.14, (x 0, , is a saddle point solution.

The dual optimal value is given by ∗ ≡ (, sup:∈inf [∅(,,]

If we interchange the other of optimization, we get ∗ ≤ ∈inf (, sup:[∅(,,] t t But the Sup of ∅(x, u, v = f(x + uuu ggg(xg + vvv hhh(xh over (u, v with uuu ≥ 0 is infnity, unless g(x ≤ 0 and h(x = 0, whence it is f(x. Hence ∗ ≤ ∈inf (, sup:[∅(,,]

≡ inf {f(x: g(x ≤ 0, h(x = 0, x ∈ X} which is the primal optimal value. Hence we see that the primal and dual objective values match at optimality if and only f the interchange of the foregoing inf and sup operations leaves the optimal values unchanged.

By above theorem assuming that an optimum exists, this occurs if and only if there exists a saddle point (x 0, , for the Lagrangian function ∅. Relationship BBetweenetween thethethe SaddleSaddlepointpoint Criteria and the KKT Conditions Theorem 3.16: Let S = {x ∈ : ggg(x g ≤ 000, 0 hhh(x h = 000}, 0 and consider Problem P to minimize f(x subject to x ∈ S. 1. Suppose that x 0 ∈ S satisfies the KKT conditions, i.e, there exists ≥ 0 and such that

t t ∇ f(x 0 + ∇ ggg(xg 0 + ∇ hhh(xh 0 = 0 (3.7 t ggg(xg 0 = 0 2. Suppose that f, g i for i ∈ I are convex at x 0, where I = {i : g i(x 0 = 0}.

3. Further suppose if i ≠ 0, then h i is affine.

Then (x 0, , is a saddle point for the Lagrangian function ttt ttt ∅(x, uuu,u vvvv = f(x + uuu ggg(xg + vvv hhh(xh Conversely, Suppose that (x 0, , with x 0 ∈ int X and ≥ 0 is a saddle point solution. Then, x0 is a feasible to problem P and, furthermore, (x 0, , satisfies the KKT conditions specified by (3.7. ProofProof:: Suppose that (x 0, , , with x 0 ∈ S and ≥ 0 satisfies the KKT conditions specified by (3.7. By convexity at x 0 of f and g i, i ∈ I, and since h i is affine for i ≠ 0, we get for all x ∈ X t f(x ≥ f(x 0 + ∇f(x 0 (x – x0 (3.8a t gi(x ≥ gi(x 0 + ∇gi(x 0 (x – x0 for i ∈ I (3.8b t hi(x = hi(x 0 + ∇hi(x 0 (x – x0 for i = 1,…,l, i ≠0 (3.8c Multiplying (3.8b by i ≥ 0 and (3.8c by i, and adding all to (3.8a, f(x + ∈ igi(x + ihi(x ≥ f(x 0 + ∈ igi(x 0 + ihi(x 0 + t t t {∇f(x 0 + ∈ i∇gi(x 0 + i ∇hi(x 0 }(x – x0 ⇒ f(x + ∈ igi(x + ihi(x ≥ f(x 0 + ∈ igi(x 0 + ihi(x 0 [use

(3.7] i.e., ∅(x, , ≥ ∅(x 0, , for all x ∈ X t Also, since ggg(xg 0 ≤ 000,0 hhh(xh 0 = 000 and ggg(xg 0 = 0 f(x 0 + ∈ igi(x 0 + ihi(x 0 ≤ f(x 0 + ∈ igi(x 0 + ihi(x 0

i.e., ∅(x 0, uuu,u vvvv ≤ ∅(x 0, , for all (x 0, uuu,u vvvv with uuu ≥ 000.0 Thus ∅(x 0, uuu,u vvvv ≤ ∅(x 0, , ≤ ∅(x, , i.e., (x 0, , satisfies the saddle point conditions. For converse, suppose (x 0, , with x 0 ∈ int X and ≥ 0 is a saddle point solution. Since ∅(x 0, uuu,u vvvv ≤ ∅(x 0, , for all (uuuu, vvvv with uuu ≥ 000 By theorem 3.14, t ggg(xg 0 ≤ 000,0 hhh(xh 0 = 000 and ggg(xg 0 = 0 This shows that x 0 is feasible to problem P. Since ∅(x 0, , ≤ ∅(x, , for all x ∈ X, than x 0 solves the problem to minimize ∅(x, , subject to x ∈ X. Since x 0 ∈ int X, then ∇x ∅(x 0, , = 0 t t i.e, ∇ f(x 0 + ∇ ggg(xg 0 + ∇ hhh(xh 0 = 0

Remark 3.173.17: : The theorem shows that if x 0 is a KKT point then under certain convexity assumptions, the Lagrangian multipliers in the KKT conditions also serve as the multipliers in the saddle point criteria. Conversely, the multipliers in the saddle point conditions are the Lagrangian multipliers of the KKT conditions.

Properties of the Dual FunctionFunction:::: In view of theorem 3.12 & 3.14, it would be possible to solve the primal problem indirectly by solving the dual problem. For this, we need to examine the properties of the dual function. We shall assume that the set X is compact. This will simplify the proofs of several of the theorems. Note that this assumption is not unduly restrictive. If X is not bounded, one could add suitable lower and upper bounds on the variables such that the feasible region is not affected. For convenience we shall also combine the vectors uuu and vvv as www and the t t functions g and h as . i.e., w = (uuuu, vvvv , =(gggg, hhhh .

First we show that is concave  nnn nnn Theorem 3.18: Let X be a non-empty compact set in RRR , and let f : RRR R and : nnn m+l RRR RRR be continuous. Then, defined by ttt (wwww = inf {f(x + www (x : x ∈ X} m+l is concave over RRR . ProofProof:: Since f and are continuous and X is compact is finite everywhere on m+l m+l RRR . Let www111, www222 ∈ RRR and ∈(0, 1, we then have ttt [wwww111 + (1 - wwww222] = inf {f(x + [wwww111 + (1 - wwww222] (x : x ∈ X} ttt ttt = inf {[f(x + www111 (x] + (1 - [f(x + www222 (x] : x ∈ X} ttt ttt ≥ inf {f(x+www111 (x: x ∈ X}+(1 – inf{f(x + www 222 (x: x ∈ X} = (wwww111 + (1 - (wwww222 since, inf f(x + inf g(x ≤ inf {f(x + g(x} Therefore, is concave. Remark 3.3.19191919:: Since is concave, by theorem 1.19; a local optimal of is also global optimum.

Differentiability of : It will convenient to introduce the following set of optimal solutions to the Lagrangian dual sub-problem ttt X(wwww = {y : y minimizes f(x + www (x over x ∈ X} The differentiability of at any given point depends on the element of X (. Theorem 3.21 below shows that if the set X ( is a singleton then is differentiable at .   nnn nnn nnn Lemma 3.203.20:: Let X be a non-empty compact set in RRR and f : RRR R and : RRR m+l m+l RRR be continuous. Let ∈ RRR , and suppose that X ( is the singleton {x 0}. Suppose that wwwkkk , and let x k ∈ X(wwwwkkkfor each k. Then xk x0. ProofProof:: By contradiction, suppose that wwwkkk , x k ∈ X(wwwwkkk and − > > 0 for k ∈ K, where K is some index set. k K ′ Since X is compact, then the sequence {x } has a convergent subsequence {x k}K , with limit y in X. Note that − ≥ > 0, and hence y and x 0 are distinct. ′ Furthermore, for each wwwkkk with k ∈ K we have ′ t  f(xk +′ wwwkkk (x k ≤ f(x 0 + wwwkkk (x0

Taking the limit as k ∞ in K and note that x k y, wwwkkk , and that f and are continuous, it follows that t t f(y + (y ≤ f(x 0 + (x 0 Therefore, y ∈ X(, contradicting the assumption that X( is a singleton. nnn n n Theorem 3.213.21:::: Let X be non-empty compact set in RRR , and let f : RRR RRR,R and : RRR m+l m+l RRR be continuous. Let ∈ RRR , and suppose that X( is the singleton {x 0}. Then, is differentiable at with gradient ∇( = (x 0. ProofProof:: Since f and are continuous and X is compact, then, for any given www,w there exists an x www ∈ X(wwww. From the definition of , the following two inequalities hold true: ttt ttt t (wwww – ( ≤ f(x 0 + www (x 0 – f(x 0 - (x 0 = (ww - (x 0 (3.9 ttt ttt t ( – (wwww ≤ f(x www + (x www – f(x www - www (x www = (- wwww (x www (3.10 From (3.9 and (3.10 and the Schwartz inequality, it follows that t t 0 ≥ (wwww – ( – (wwww - (x 0 ≥ (wwww - [(x www - (x 0] ≥ - ∥ - www∥w ∥ (x www - (x 0∥ This further implies that ((( ( www 0 0 ≥ ≥ - ∥ (x – (x ∥ (3.11    As www , then, by lemma 3.20, x www x0 and by the continuity of , (x www (x 0. Therefore, from (3.11, we get ((( ( lim → = 0.

Hence, is differentiable at with gradient (x 0. Note 3.223.22:: As is concave (theorem 3.19, by theorem 1.14, is sub- differentiable i.e., it has subgradients. We shall show that these subgradients characterize the direction of ascent.  nnn nnn Theorem 3.23: Let X be a non-empty compact set in RRR , and let f : RRR R and : nnn m+l m+l RRR RRR be continuous so that for any ∈ RRR , X( is not empty. If x 0 ∈ X(, then, (x 0 is a sub-gradient of at . m+l ProofProof:: Since f and are continuous and X is compact, X( ≠ ∅ for any ∈ RRR , and let x 0 ∈ X(. Then, ttt (wwww = inf {f(x + www (x : x ∈ X} ttt ≤ f(x 0 + www (x 0 ttt ttt = f(x 0 + (ww --- (x 0 + (x 0 t = ( + (ww - (x 0 Therefore, (x 0 is a subgradient of at .

Example 3.24: Consider the following primal Problem: Minimize -x1 – x2 Subject to x1 + 2x 2 – 3 ≤ 0 x1, x2 = 0, 1,2, or 3 Let g(x 1, x 2 = x 1 + 2x 2 – 3 and X = {(x 1, x 2 : x 1, x 2 = 0, 1, 2, or 3}, so that = 16. Then, the dual function is given by (u = inf {- x1 – x2 + u(x 1 + 2x 2 – 3 : x 1, x2 = 0, 1, 2, or 3} = inf {-3u, -1-u,-2+u, -3+3u, -1-2u, -2, -3+2u, -4+4u, -2-u, 3+u,4+3u, -5+5u, -3, -4+2u, -5+4u, -6+6u}

Figure 3.73.7---- Graph of dual function and hence, −6 + 6 0 ≤ ≤ (u = −3 ≤ ≤ 1 −3 ≥ 1

Now let = . In order to find a subgradient of at , consider the following subproblem: 1 2 1 2 Minimize -x – x + (x + 2x – 3

Subject to x 1, x 2 = 0, 1, 2, or 3 Note that the set X( of optimal solutions to the above problem is {(3, 0, (3, 1, (3, 2, and (3, 3}. Thus, from theorem 3.23, g(0, 3 = 0, g(3, 1 = 2, g(3, 2 = 4, and g(3, 3 = 6 are subgradients of at . Note, however, that is also a 0 0 subgradient of at , but can not be represented as g(x for any x ∈ X(.

Note 3.253.25:: From the above example, it is clear that theorem 3.23, gives only a sufficient characterization of subgradients.  nnn nnn Theorem 3.263.26:: Let X be a non-empty compact set in RRR , and let f : RRR R and : nnn m+l m+l RRR RRR be continuous. Let ,,, d ∈ RRR . Then, the directional derivative of at in the direction d satisfies′ ttt (;;; dddd ≥ ddd (x 0 for some x 0 ∈ X( + PPProofProofroof:: Consider + kddd,d where k 0 . For each k, there exists an x k ∈ X( + kddd;d and since X is compact, there is a convergent subsequence {x k}K with limit x0 in X. Given an x ∈ X, note that ttt ttt f(x +( + kddd (x ≥ f(x k + ( + kddd (x k for each k ∈ K. Taking the limit as k → ∞, it follows that ttt ttt f(x + (x ≥ f(x 0 + (x 0 i.e., x 0 ∈ X(. Furthermore, by the definition of ( + kddd and (, we get t ( + kddd - ( = f(x k + ( + k dddd (x k - ( ttt ttt = f(x k + (x k - ( + kddd (x k ttt ttt ≥ k ddd (x k [∵ f(x k + (x k - ( ≥ 0]   The above inequality holds true for each k ∈ K. Note that x k x0 as k ∞ foe each k ∈ K, we get, (( ttt ∈ 0 lim ≥ ddd (x →

By lemma 1.17, ′ (( (;;; dddd = lim → exists. Corollary 3.273.27:: Let ( be the collection of sub-gradients of at , and suppose that assumptions′ of the theorem hold true. Then, ttt (;;; dddd = inf {dddd : ∈ (} ProofProof:: Let x 0 ∈ X(. By theorem 3.23, (x 0 ∈ (; and, hence, theorem 3.26, implies ′ ttt (;;; dddd ≥ inf { ddd : ∈ (} (3.12 Now let ∈ (. Since is concave ttt ( + dddd - ( ≤ dddd + Dividing by > 0 and taking the limit′ as 0 ttt (;;; dddd ≤ ddd Since this is true ′for each ∈ (. ttt (;;; dddd ≤ inf {dddd : ∈ (} (3.13 Thus (3.12 & (3.13 imply the result.  nnn nnn ThThTheoremTh eorem 3.283.28:: Let X be a nonempty compact set in RRR , and let f : RRR RRR,R and : nnn m+l m+l RRR RRR be continuous. Then, is a subgradient of at ∈ RRR if and only if belongs to the of {(y : y ∈ X(}. ProofProof:: Write ∧ = {(y : y ∈ X(} Let H(∧ be the convex hull of ∧. By theorem 3.23, ∧ ⊆ (; and since ( is convex, H(∧ ⊆ (. As X is compact and is continuous, ∧ is compact. Furthermore, the convex hull of a compact set is closed. Therefore, H(∧ is a closed set. ′ We shall now show that H(∧ ⊇ (. By contradiction, suppose ∈ ( but ′ ∉ H(∧. By theorem 1.15, there exists a scalar and a nonzero vector ddd such that ttt ddd (y ≥ for all y ∈ X( (3.14 t ddd < ′ (3.15 ttt By theorem 3.26, there exists a y ∈ X( such that (;;; ddd d ≥ ddd (y and by (3.14. We have ′ (;;; dddd ≥ But the corollary 3.27, and′ (3.15 gives ′ ttt t (;;; dddd = inf {dddd : ∈ (} ≤ ddd < which is a contradiction. Therefore, ∈H(∧,, and ( = H(∧. ExamplExamplee 3.293.29:: Consider the problem (Example 3.10 Minimum f(x = -2x 1 + x 2 Subject to h(x = x 1 + x 2 – 3 = 0 (x 1, x 2 ∈ X Where X = {(0, 0, (0, 4, (4, 4, (4, 0, (1, 2, (2, 1} The dual function (v, v ∈ R is as shown in figure 3.5. Note That is differentiable for all v except for v = -1 and v = 2. Consider v = 2, for example. The set X(2 is given by set of alternative optimal solutions to the problem (2 = min {3x 2 – 6 : (x 1, x 2 ∈ X} Hence X(2 = {(0, 0, (4, 0}, with (2 = -6. By theorem 3.23, the subgradients of the form (x 0 for x 0 ∈ X(2 are h(0, 0 = -3 and h(4, 0 = 1. Observe that, in Figure 3.5, these values are the slopes of two affine segments defining the graph of that are incident at the point (v, (v = (2, -6. Therefore, as in theorem 3.28, the set of subgradients of at v = 2, which is given by the slopes of the two affine supports for the of , is precisely [-3, 1], the set of convex combinations of -3 and 1. Example 3.303.30:: Consider the following primal problem 2 2 Minimize -(x 1 - 4 – (x 2 - 4 Subject to x1 – 3 ≤ 0 -x1 + x 2 – 2 ≤ 0 x1 + x 2 - 4 ≤ 0 x1, x2 ≥ 0

Let g1(x 1, x 2 = x 1 – 3, g2(x 1, x 2 = -x1 + x 2 – 2 and X = {(x 1, x2 : x 1 + x 2 – 4 ≤ 0; x 1, x2 ≥ 0} Thus the dual function is given by 2 2 (u 1, u 2 = inf {-(x 1 - 4 - (x 2 - 4 + u 1 (x 1 – 3 + u 2(-x1 + x 2 – 2 : x ∈ X} We utilize theorem 3.28, above to determine the set of subgradients of at = t (1, 5 . To find the set X(, we need to solve the following Problems: 2 2 Minimize -(x 1 - 4 – (x 2 - 4 - 4x 1 + 5x 2 – 13 Subject to x1 + x2 – 4 ≤ 0 x1, x2 ≥ 0

Figure 3.83.8- Illustration of sudgradients The objective function of this subproblem is concave and by theorem 1.18, it assumes its minimum over a compact polyhedral set at one of the extreme points. The polyhedral set X has there extreme points, namely, (0, 0, (4, 0, and (0, 4. Note that f(0, 0 = f(4, 0 = -45 and f(0, 4 = -9, it is evident that the optimal solutions of the above sub-problem are (0, 0 and (4, 0, i.e., X( = {(0, 0, (4, 0}. By theorem 3.28, the subgradients of at are thus given by the convex combinations of g(0, 0 and g(4, 0, i.e., by convex combinations of the t t two vectors (-3, -2 and (1, -6 . Now, we discuss two special cases of Lagrangian duality; namely linear and quadratic programming.

Examples 3.313.31:::: Linear Programming: Consider the following primal linear program: ttt Minimize ccc xxx Subject to Ax = bbb x ≥ 000 Let X = {xxxx : xxx ≥ 000}, 0 the Lagrangian dual of this problem is to maximize (vvvv, where ttt ttt ttt ttt ttt (vvvv = inf {cccc x + vvv (bb --- AxAxAxAx : x ≥ 000}0 = vvv b + inf {( ccc - vvv AAAxA x : x ≥ 000 } Clearly, (vvvv = if – ≥ −∞ otherwise Hence, the dual problem can be stated as follows: ttt Maximize vvv bbb ttt Subject to AAA v ≤ ccc Quadratic Programming Consider the following quadratic programming Problem: ttt ttt Minimize xxx Hx + ddd xxx

Subject to Ax ≤ bbb where HHH is symmetric and positive semi definite, so that the objective function is convex. The Lagrangian dual problem is to maximize (uuuu over u ≥ 000,0 where ttt ttt ttt nnn (uuuu = inf {xxx Hx + ddd xxx + uuu (AxAx - bbbb : xxx ∈ RRR } (3.16 ttt ttt ttt Note that for a given uuu,u the function xxx Hx + ddd xxx + uuu (AxAx - bbbb is convex, and so a necessary and sufficient condition for a minimum is that the gradient must vanish, i.e., ttt HxHxHxHx + AAA u + d = 0 (3.17 Thus, the dual problem can be written as follows: ttt ttt ttt Maximize xxx Hx + ddd xxx + uuu (AxAx - bbbb (3.18 ttt Subject to Hx + AAA u = - ddd u ≥ 000 ttt ttt ttt Now from (3.17, we have ddd x + uuu Ax = - xxx HxHxHx.Hx Substituting this into (3.18, we derive the familiar form of Dorn’s dual quadratic program given below: ttt ttt Maximize - xxx Hx – bbb uuu ttt Subject to Hx + AAA u = -dd (3.19 u ≥ 000 We now develop an alternative form of the Lagrangian dual problem ---1-111 under the assumption that H is positive definite, so that HHH exists. In this case, the unique solution to (3.17 is given by ---1-111 ttt x = -HHHH (dddd + AAA uuuu Substituting in (3.16, ---1-111 ttt t ---1-111 ttt ttt ---1-111 ttt ttt ---1-111 ttt (uuuu = (-HHHH (AAAA uuu + d+ dd HHH(-HH HHH (AAAA uuu + d+ dd + ddd (-HHHH (AAAA uuu + d+ dd + uuu {AA [A [-H[ HHH (AAAA uuu

+ ddd]d –b}b}b}b} ttt ttt ---1-111 ttt ---1-111 ttt ttt ---1-111 ttt ttt ---1-111 ttt = - (uuuu A + ddd (HHHH HHH (HHHH (AAAA u + dddd – ddd HHH ( AAA u + dddd – uuu AAA(A(((HHHH (AAAA u + ttt dddd - uuu bbb ttt ttt ---1-111 ttt ttt ---1-111 ttt ttt ---1-111 ttt ttt = - (uuuu A + ddd HHH (AAAA u + dddd – ddd HHH ( AAA u + dddd – uuu AHAHAH (AAAA u + dddd - uuu bbb ---1-111 ttt ---1-1 {(HHHH = HHH , Since H is symmetric} ttt ttt ttt ---1-111 = uuu DDD u + uuu c - ddd HHH ddd ---1-111 ttt ---1-111 where D = -AHAHAHAH AAA and c = -bb –AHAHAHAH ddd.d The dual problem is thus given by ttt ttt ttt ---1-111 Maximize uuu Du + uuu c - ddd HHH ddd (3.20

Subject to u ≥ 000 Reference: 1. Nonlinear Programming, Theory and Algorithms, M.S.Bazaraa, H.D.Sherali and C.M.Shetty. John Wiley & Sons Co., © 2004.